BlueLM-V-3B: 모바일 장치에서 다중 모달 대형 언어 모델을 위한 알고리즘 및 시스템 공동 설계

초록

다중 모달 대형 언어 모델(MLLMs)의 등장과 인기 증가는 의사 소통 개선부터 학습과 문제 해결을 용이하게 하는 등 일상 생활의 다양한 측면을 향상시킬 중요한 잠재력을 지니고 있습니다. 핵심적인 일상 동반자인 휴대전화는 MLLMs를 가장 효과적이고 접근성 있게 배포할 수 있는 플랫폼으로, 일상적인 작업에 매끄럽게 통합되도록 가능하게 합니다. 그러나 휴대전화에 MLLMs를 배포하는 것은 메모리 크기와 계산 능력의 제한으로 인해 도전적이며, 광범위한 최적화 없이 부드럽고 실시간 처리를 달성하는 것이 어렵습니다. 본 논문에서는 모바일 플랫폼에 효율적인 MLLMs 배포를 위해 특별히 설계된 BlueLM-V-3B 알고리즘 및 시스템 공동 설계 접근 방식을 제시합니다. 구체적으로, 우리는 주류 MLLMs에서 채택된 동적 해상도 체계를 재설계하고, 모바일 휴대전화에서 모델 추론을 최적화하기 위해 하드웨어 인식 배포를 구현합니다. BlueLM-V-3B는 다음과 같은 주요 특징을 자랑합니다: (1) 소형 크기: BlueLM-V-3B는 27억 개의 매개변수를 가진 언어 모델과 4억 개의 매개변수를 가진 비전 인코더를 특징으로 합니다. (2) 빠른 속도: BlueLM-V-3B는 4비트 LLM 가중치 양자화를 사용한 MediaTek Dimensity 9300 프로세서에서 24.4 토큰/초의 생성 속도를 달성합니다. (3) 강력한 성능: BlueLM-V-3B는 4B 이하의 매개변수를 가진 모델 중 OpenCompass 벤치마크에서 가장 높은 평균 점수인 66.1을 달성하고, 훨씬 더 큰 매개변수 크기를 가진 일련의 모델(e.g., MiniCPM-V-2.6, InternVL2-8B)를 앞섰습니다.

English

The emergence and growing popularity of multimodal large language models (MLLMs) have significant potential to enhance various aspects of daily life, from improving communication to facilitating learning and problem-solving. Mobile phones, as essential daily companions, represent the most effective and accessible deployment platform for MLLMs, enabling seamless integration into everyday tasks. However, deploying MLLMs on mobile phones presents challenges due to limitations in memory size and computational capability, making it difficult to achieve smooth and real-time processing without extensive optimization. In this paper, we present BlueLM-V-3B, an algorithm and system co-design approach specifically tailored for the efficient deployment of MLLMs on mobile platforms. To be specific, we redesign the dynamic resolution scheme adopted by mainstream MLLMs and implement system optimization for hardware-aware deployment to optimize model inference on mobile phones. BlueLM-V-3B boasts the following key highlights: (1) Small Size: BlueLM-V-3B features a language model with 2.7B parameters and a vision encoder with 400M parameters. (2) Fast Speed: BlueLM-V-3B achieves a generation speed of 24.4 token/s on the MediaTek Dimensity 9300 processor with 4-bit LLM weight quantization. (3) Strong Performance: BlueLM-V-3B has attained the highest average score of 66.1 on the OpenCompass benchmark among models with leq 4B parameters and surpassed a series of models with much larger parameter sizes (e.g., MiniCPM-V-2.6, InternVL2-8B).

BlueLM-V-3B: 모바일 장치에서 다중 모달 대형 언어 모델을 위한 알고리즘 및 시스템 공동 설계

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

초록

Support