Awaker2.5-VL:使用参数高效的专家混合模型稳定扩展MLLMs
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts
November 16, 2024
作者: Jinqiang Long, Yanqi Dai, Guoxing Yang, Hongpeng Lin, Nanyi Fei, Yizhao Gao, Zhiwu Lu
cs.AI
摘要
随着多模态大型语言模型(MLLMs)研究的普及,通常需要先进的MLLM模型同时处理各种文本和视觉任务(例如VQA、检测、OCR和ChartQA)以应用于实际场景。然而,由于不同任务数据的表示和分布存在显著差异,简单地将所有任务的数据混合在一起会导致众所周知的“多任务冲突”问题,从而导致各种任务性能下降。为了解决这一问题,我们提出了Awaker2.5-VL,这是一种适用于MLLM的专家混合(MoE)架构,通过多个稀疏激活的专家获得多任务能力。为了加快Awaker2.5-VL的训练和推断速度,我们模型中的每个专家被设计为低秩适应(LoRA)结构。在多个最新基准测试上进行的大量实验表明了Awaker2.5-VL的有效性。我们的项目页面发布了代码和模型权重:https://github.com/MetabrainAGI/Awaker。
English
As the research of Multimodal Large Language Models (MLLMs) becomes popular,
an advancing MLLM model is typically required to handle various textual and
visual tasks (e.g., VQA, Detection, OCR, and ChartQA) simultaneously for
real-world applications. However, due to the significant differences in
representation and distribution among data from various tasks, simply mixing
data of all tasks together leads to the well-known``multi-task conflict" issue,
resulting in performance degradation across various tasks. To address this
issue, we propose Awaker2.5-VL, a Mixture of Experts~(MoE) architecture
suitable for MLLM, which acquires the multi-task capabilities through multiple
sparsely activated experts. To speed up the training and inference of
Awaker2.5-VL, each expert in our model is devised as a low-rank adaptation
(LoRA) structure. Extensive experiments on multiple latest benchmarks
demonstrate the effectiveness of Awaker2.5-VL. The code and model weight are
released in our Project Page: https://github.com/MetabrainAGI/Awaker.Summary
AI-Generated Summary