彩票大语言模型假说:重新思考大语言模型压缩应保留哪些能力?
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
February 24, 2025
作者: Zhenheng Tang, Xiang Liu, Qian Wang, Peijie Dong, Bingsheng He, Xiaowen Chu, Bo Li
cs.AI
摘要
为降低大型语言模型(LLMs)的计算与存储成本,模型压缩及键值缓存(KV Cache)压缩技术已引起研究者广泛关注。然而,现有方法主要聚焦于通过困惑度或常识问答与基础算术推理任务上的简单准确率来维持压缩后LLMs的性能。本文简要回顾了近期在检索增强生成、多步推理、外部工具应用及计算表达能力等方面取得的进展,这些进展显著提升了LLM的表现。继而,我们提出“彩票LLM”假说,即针对特定LLM及任务,存在一个规模更小的彩票LLM,借助多步推理与外部工具,能实现与原LLM相当的性能。基于对LLM当前发展状况的梳理,我们探讨并总结了彩票LLM与KV缓存压缩所必需的关键能力,这些能力在现有方法中尚未得到充分重视。
English
Motivated by reducing the computational and storage costs of LLMs, model
compression and KV cache compression have attracted much attention from
researchers. However, current methods predominantly emphasize maintaining the
performance of compressed LLMs, as measured by perplexity or simple accuracy on
tasks of common sense knowledge QA and basic arithmetic reasoning. In this
blog, we present a brief review of recent advancements in LLMs related to
retrieval-augmented generation, multi-step reasoning, external tools, and
computational expressivity, all of which substantially enhance LLM performance.
Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and
task, there exists a smaller lottery LLM capable of producing the same
performance as the original LLM with the assistance of multi-step reasoning and
external tools. Based on the review of current progress in LLMs, we discuss and
summarize the essential capabilities that the lottery LLM and KV cache
compression must possess, which are currently overlooked in existing methods.Summary
AI-Generated Summary