目標:通過標記合併和修剪實現多模式LLM的自適應推理
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
December 4, 2024
作者: Yiwu Zhong, Zhuoming Liu, Yin Li, Liwei Wang
cs.AI
摘要
大型語言模型(LLMs)已經使多模式LLMs的創建成為可能,這些模型展現出對視覺數據(如圖像和視頻)的強大理解能力。然而,這些模型通常依賴來自視覺編碼器的大量視覺標記,導致高計算需求,這限制了它們在資源受限環境和長篇文本任務中的應用。在這項工作中,我們提出了一種無需訓練的適應性推理方法,適用於多模式LLMs,可以滿足廣泛的效率要求,並最小化性能下降。我們的方法包括:a)在LLMs之前基於嵌入相似性進行迭代標記合併,以及b)基於多模式重要性在LLM層內進行漸進式標記修剪。通過極簡設計,我們的方法可應用於視頻和圖像LLMs。在各種視頻和圖像基準測試上進行的大量實驗表明,我們的方法大幅減少了計算負載(例如,在FLOPs上減少了7倍),同時保持了視頻和圖像LLMs的性能。此外,在類似的計算成本下,我們的方法在長視頻理解方面優於最先進的方法(例如,在MLVU上+4.6)。此外,我們的深入分析提供了關於標記冗餘性和LLM層行為的見解,為未來設計高效多模式LLMs的研究提供指導。我們的程式碼將在https://github.com/LaVi-Lab/AIM 上提供。
English
Large language models (LLMs) have enabled the creation of multi-modal LLMs
that exhibit strong comprehension of visual data such as images and videos.
However, these models usually rely on extensive visual tokens from visual
encoders, leading to high computational demands, which limits their
applicability in resource-constrained environments and for long-context tasks.
In this work, we propose a training-free adaptive inference method for
multi-modal LLMs that can accommodate a broad range of efficiency requirements
with a minimum performance drop. Our method consists of a) iterative token
merging based on embedding similarity before LLMs, and b) progressive token
pruning within LLM layers based on multi-modal importance. With a minimalist
design, our method can be applied to both video and image LLMs. Extensive
experiments on diverse video and image benchmarks demonstrate that, our method
substantially reduces computation load (e.g., a 7-fold reduction in
FLOPs) while preserving the performance of video and image LLMs. Further, under
a similar computational cost, our method outperforms the state-of-the-art
methods in long video understanding (e.g., +4.6 on MLVU).
Additionally, our in-depth analysis provides insights into token redundancy and
LLM layer behaviors, offering guidance for future research in designing
efficient multi-modal LLMs. Our code will be available at
https://github.com/LaVi-Lab/AIM.Summary
AI-Generated Summary
1比特LLM時代:所有大型語言模型都在1.58比特。The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
1比特LLM時代:所有大型語言模型都在1.58比特。
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei•Feb 27, 2024•612142
DeepSeek-R1:通過強化學習激勵LLM中的推理能力DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-R1:通過強化學習激勵LLM中的推理能力
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Qu, Hui Li, Jianzhong Guo, Jiashi Li, Jiawei Wang, Jingchang Chen, Jingyang Yuan, Junjie Qiu, Junlong Li, J. L. Cai, Jiaqi Ni, Jian Liang, Jin Chen, Kai Dong, Kai Hu, Kaige Gao, Kang Guan, Kexin Huang, Kuai Yu, Lean Wang, Lecong Zhang, Liang Zhao, Litong Wang, Liyue Zhang, Lei Xu, Leyi Xia, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Meng Li, Miaojun Wang, Mingming Li, Ning Tian, Panpan Huang, Peng Zhang, Qiancheng Wang, Qinyu Chen, Qiushi Du, Ruiqi Ge, Ruisong Zhang, Ruizhe Pan, Runji Wang, R. J. Chen, R. L. Jin, Ruyi Chen, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shengfeng Ye, Shiyu Wang, Shuiping Yu, Shunfeng Zhou, Shuting Pan, S. S. Li, Shuang Zhou, Shaoqing Wu, Shengfeng Ye, Tao Yun, Tian Pei, Tianyu Sun, T. Wang, Wangding Zeng, Wanjia Zhao, Wen Liu, Wenfeng Liang, Wenjun Gao, Wenqin Yu, Wentao Zhang, W. L. Xiao, Wei An, Xiaodong Liu, Xiaohan Wang, Xiaokang Chen, Xiaotao Nie, Xin Cheng, Xin Liu, Xin Xie, Xingchao Liu, Xinyu Yang, Xinyuan Li, Xuecheng Su, Xuheng Lin, X. Q. Li, Xiangyue Jin, Xiaojin Shen, Xiaosha Chen, Xiaowen Sun, Xiaoxiang Wang, Xinnan Song, Xinyi Zhou, Xianzu Wang, Xinxia Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Yang Zhang, Yanhong Xu, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Wang, Yi Yu, Yichao Zhang, Yifan Shi, Yiliang Xiong, Ying He, Yishi Piao, Yisong Wang, Yixuan Tan, Yiyang Ma, Yiyuan Liu, Yongqiang Guo, Yuan Ou, Yuduan Wang, Yue Gong, Yuheng Zou, Yujia He, Yunfan Xiong, Yuxiang Luo, Yuxiang You, Yuxuan Liu, Yuyang Zhou, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yaohui Li, Yi Zheng, Yuchen Zhu, Yunxian Ma, Ying Tang, Yukun Zha, Yuting Yan, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhean Xu, Zhenda Xie, Zhengyan Zhang, Zhewen Hao, Zhicheng Ma, Zhigang Yan, Zhiyu Wu, Zihui Gu, Zijia Zhu, Zijun Liu, Zilin Li, Ziwei Xie, Ziyang Song, Zizheng Pan, Zhen Huang, Zhipeng Xu, Zhongyu Zhang, Zhen Zhang•Jan 22, 2025•3685
Qwen2.5 技術報告Qwen2.5 Technical Report
Qwen2.5 技術報告
Qwen2.5 Technical Report
Qwen, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu Xia, Xingzhang Ren, Xuancheng Ren, Yang Fan, Yang Su, Yichang Zhang, Yu Wan, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zihan Qiu•Dec 19, 2024•36311