一個完美的解決方案還是對完整關注的妥協?一項關於Gist Token-based上下文壓縮的全面研究。

A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression

December 23, 2024
作者: Chenlong Deng, Zhisong Zhang, Kelong Mao, Shuaiyi Li, Xinting Huang, Dong Yu, Zhicheng Dou
cs.AI

摘要

在這份研究中,我們對基於要點的上下文壓縮方法進行了深入的探討,以改善大型語言模型中的長篇上下文處理。我們專注於兩個關鍵問題:(1) 這些方法能否很好地取代完整的注意力模型?以及 (2) 壓縮可能導致的潛在失敗模式是什麼?通過大量實驗,我們展示了基於要點的壓縮在檢索增強生成和長文件問答等任務上可以實現接近無損的性能,但在合成回憶等任務中面臨挑戰。此外,我們識別了三個關鍵的失敗模式:邊界遺失、驚喜遺失和途中遺失。為了緩解這些問題,我們提出了兩種有效策略:細粒度自編碼,增強對原始標記信息的重建,以及分段式標記重要性估計,根據標記依賴性調整優化。我們的工作深入理解基於要點標記的上下文壓縮,並提供了改善壓縮能力的實用策略。
English
In this work, we provide a thorough investigation of gist-based context compression methods to improve long-context processing in large language models. We focus on two key questions: (1) How well can these methods replace full attention models? and (2) What potential failure patterns arise due to compression? Through extensive experiments, we show that while gist-based compression can achieve near-lossless performance on tasks like retrieval-augmented generation and long-document QA, it faces challenges in tasks like synthetic recall. Furthermore, we identify three key failure patterns: lost by the boundary, lost if surprise, and lost along the way. To mitigate these issues, we propose two effective strategies: fine-grained autoencoding, which enhances the reconstruction of original token information, and segment-wise token importance estimation, which adjusts optimization based on token dependencies. Our work provides valuable insights into the understanding of gist token-based context compression and offers practical strategies for improving compression capabilities.

Summary

AI-Generated Summary

PDF312December 27, 2024