BlueLM-V-3B:用於行動裝置的多模式大型語言模型的演算法和系統共同設計

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

November 16, 2024
作者: Xudong Lu, Yinghao Chen, Cheng Chen, Hui Tan, Boheng Chen, Yina Xie, Rui Hu, Guanxin Tan, Renshou Wu, Yan Hu, Yi Zeng, Lei Wu, Liuyang Bian, Zhaoxiong Wang, Long Liu, Yanzhou Yang, Han Xiao, Aojun Zhou, Yafei Wen, Xiaoxin Chen, Shuai Ren, Hongsheng Li
cs.AI

摘要

多模式大型語言模型(MLLMs)的出現和日益普及具有顯著潛力,能夠增強日常生活的各個方面,從改善溝通到促進學習和問題解決。作為必不可少的日常伴侶,手機代表了最有效和可訪問的MLLMs部署平台,使其能夠無縫集成到日常任務中。然而,在手機上部署MLLMs面臨挑戰,因為記憶大小和計算能力的限制,這使得在沒有廣泛優化的情況下難以實現平滑和實時處理。在本文中,我們提出了BlueLM-V-3B,這是一種針對在移動平台上高效部署MLLMs的算法和系統共同設計方法。具體來說,我們重新設計了主流MLLMs採用的動態解析方案,並實施了硬件感知部署的系統優化,以優化手機上的模型推斷。BlueLM-V-3B擁有以下主要亮點:(1)體積小:BlueLM-V-3B具有包含27億參數的語言模型和包含4億參數的視覺編碼器。 (2)速度快:BlueLM-V-3B在MediaTek Dimensity 9300處理器上實現了24.4個token/s的生成速度,並採用了4位LLM權重量化。 (3)性能強:BlueLM-V-3B在OpenCompass基準測試中取得了66.1的最高平均分,超過了一系列具有更大參數大小的模型(例如MiniCPM-V-2.6,InternVL2-8B)中小於4B參數的模型。
English
The emergence and growing popularity of multimodal large language models (MLLMs) have significant potential to enhance various aspects of daily life, from improving communication to facilitating learning and problem-solving. Mobile phones, as essential daily companions, represent the most effective and accessible deployment platform for MLLMs, enabling seamless integration into everyday tasks. However, deploying MLLMs on mobile phones presents challenges due to limitations in memory size and computational capability, making it difficult to achieve smooth and real-time processing without extensive optimization. In this paper, we present BlueLM-V-3B, an algorithm and system co-design approach specifically tailored for the efficient deployment of MLLMs on mobile platforms. To be specific, we redesign the dynamic resolution scheme adopted by mainstream MLLMs and implement system optimization for hardware-aware deployment to optimize model inference on mobile phones. BlueLM-V-3B boasts the following key highlights: (1) Small Size: BlueLM-V-3B features a language model with 2.7B parameters and a vision encoder with 400M parameters. (2) Fast Speed: BlueLM-V-3B achieves a generation speed of 24.4 token/s on the MediaTek Dimensity 9300 processor with 4-bit LLM weight quantization. (3) Strong Performance: BlueLM-V-3B has attained the highest average score of 66.1 on the OpenCompass benchmark among models with leq 4B parameters and surpassed a series of models with much larger parameter sizes (e.g., MiniCPM-V-2.6, InternVL2-8B).

Summary

AI-Generated Summary

PDF455November 19, 2024