LLM在KV缓存压缩下能保持基本能力吗?
Can LLMs Maintain Fundamental Abilities under KV Cache Compression?
February 4, 2025
作者: Xiang Liu, Zhenheng Tang, Hong Chen, Peijie Dong, Zeyu Li, Xiuze Zhou, Bo Li, Xuming Hu, Xiaowen Chu
cs.AI
摘要
本文研究了大型语言模型(LLMs)中一个鲜为人知的挑战:KV缓存压缩方法对LLMs基本能力的影响。尽管现有方法在长上下文基准上实现了令人印象深刻的压缩比,但它们对核心模型能力的影响仍未得到充分研究。我们展示了一项全面的实证研究,评估了不同任务中突出的KV缓存压缩方法,涵盖世界知识、常识推理、算术推理、代码生成、安全性以及长上下文理解和生成。我们的分析表明,KV缓存压缩方法表现出特定任务的性能下降。算术推理任务对激进压缩尤为敏感,不同方法显示出17.4%-43.3%的性能下降。值得注意的是,DeepSeek R1 Distill模型相比于经过指令调整的模型表现出更强的压缩容忍度,仅显示出9.67%-25.53%的性能下降。基于我们对注意力模式和跨任务压缩性能的分析,我们提出了ShotKV,一种独特处理预填充和解码阶段的新型压缩方法,同时保持了shot级语义连贯性。实证结果表明,ShotKV在激进压缩比下长上下文生成任务中实现了9%-18%的性能改进。
English
This paper investigates an under-explored challenge in large language models
(LLMs): the impact of KV cache compression methods on LLMs' fundamental
capabilities. While existing methods achieve impressive compression ratios on
long-context benchmarks, their effects on core model capabilities remain
understudied. We present a comprehensive empirical study evaluating prominent
KV cache compression methods across diverse tasks, spanning world knowledge,
commonsense reasoning, arithmetic reasoning, code generation, safety, and
long-context understanding and generation.Our analysis reveals that KV cache
compression methods exhibit task-specific performance degradation. Arithmetic
reasoning tasks prove particularly sensitive to aggressive compression, with
different methods showing performance drops of 17.4%-43.3%. Notably, the
DeepSeek R1 Distill model exhibits more robust compression tolerance compared
to instruction-tuned models, showing only 9.67%-25.53% performance
degradation. Based on our analysis of attention patterns and cross-task
compression performance, we propose ShotKV, a novel compression approach that
distinctly handles prefill and decoding phases while maintaining shot-level
semantic coherence. Empirical results show that ShotKV achieves 9%-18%
performance improvements on long-context generation tasks under aggressive
compression ratios.Summary
AI-Generated Summary