彩票大语言模型假说：重新思考大语言模型压缩应保留哪些能力？

摘要

为降低大型语言模型（LLMs）的计算与存储成本，模型压缩及键值缓存（KV Cache）压缩技术已引起研究者广泛关注。然而，现有方法主要聚焦于通过困惑度或常识问答与基础算术推理任务上的简单准确率来维持压缩后LLMs的性能。本文简要回顾了近期在检索增强生成、多步推理、外部工具应用及计算表达能力等方面取得的进展，这些进展显著提升了LLM的表现。继而，我们提出“彩票LLM”假说，即针对特定LLM及任务，存在一个规模更小的彩票LLM，借助多步推理与外部工具，能实现与原LLM相当的性能。基于对LLM当前发展状况的梳理，我们探讨并总结了彩票LLM与KV缓存压缩所必需的关键能力，这些能力在现有方法中尚未得到充分重视。

English

Motivated by reducing the computational and storage costs of LLMs, model compression and KV cache compression have attracted much attention from researchers. However, current methods predominantly emphasize maintaining the performance of compressed LLMs, as measured by perplexity or simple accuracy on tasks of common sense knowledge QA and basic arithmetic reasoning. In this blog, we present a brief review of recent advancements in LLMs related to retrieval-augmented generation, multi-step reasoning, external tools, and computational expressivity, all of which substantially enhance LLM performance. Then, we propose a lottery LLM hypothesis suggesting that for a given LLM and task, there exists a smaller lottery LLM capable of producing the same performance as the original LLM with the assistance of multi-step reasoning and external tools. Based on the review of current progress in LLMs, we discuss and summarize the essential capabilities that the lottery LLM and KV cache compression must possess, which are currently overlooked in existing methods.

彩票大语言模型假说：重新思考大语言模型压缩应保留哪些能力？

The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?

摘要

Summary

Support

Support