全神贯注的银弹还是妥协?基于要点标记的上下文压缩的全面研究
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression
December 23, 2024
作者: Chenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Xinting Huang, Dong Yu, Zhicheng Dou
cs.AI
摘要
在这项工作中,我们对基于主旨的上下文压缩方法进行了彻底的研究,以改善大型语言模型中的长上下文处理。我们关注两个关键问题:(1)这些方法能否很好地取代完整注意力模型?以及(2)由于压缩而出现的潜在失败模式是什么?通过大量实验,我们表明,虽然基于主旨的压缩在诸如检索增强生成和长文档问答等任务中可以实现接近无损性能,但在合成召回等任务中面临挑战。此外,我们确定了三种关键的失败模式:边界丢失、惊喜丢失和途中丢失。为了减轻这些问题,我们提出了两种有效策略:细粒度自动编码,增强原始标记信息的重建,以及分段式标记重要性估计,根据标记依赖性调整优化。我们的工作为理解基于主旨标记的上下文压缩提供了宝贵的见解,并提供了改进压缩能力的实用策略。
English
In this work, we provide a thorough investigation of gist-based context
compression methods to improve long-context processing in large language
models. We focus on two key questions: (1) How well can these methods replace
full attention models? and (2) What potential failure patterns arise due to
compression? Through extensive experiments, we show that while gist-based
compression can achieve near-lossless performance on tasks like
retrieval-augmented generation and long-document QA, it faces challenges in
tasks like synthetic recall. Furthermore, we identify three key failure
patterns: lost by the boundary, lost if surprise, and lost along the way. To
mitigate these issues, we propose two effective strategies: fine-grained
autoencoding, which enhances the reconstruction of original token information,
and segment-wise token importance estimation, which adjusts optimization based
on token dependencies. Our work provides valuable insights into the
understanding of gist token-based context compression and offers practical
strategies for improving compression capabilities.Summary
AI-Generated Summary