완전한 주의에 대한 실버 불릿 또는 타협? Gist 토큰 기반 컨텍스트 압축의 포괄적 연구

초록

본 연구에서는 큰 언어 모델에서 장기 맥락 처리를 향상시키기 위한 핵심 기반 컨텍스트 압축 방법을 철저히 조사합니다. 우리는 두 가지 주요 질문에 초점을 맞춥니다: (1) 이러한 방법이 완전한 주의 모델을 얼마나 잘 대체할 수 있는가? 그리고 (2) 압축으로 인해 발생하는 잠재적인 실패 패턴은 무엇인가? 광범위한 실험을 통해, 우리는 기본을 기반으로 한 압축이 검색 증강 생성 및 장문 질의응답과 같은 작업에서 거의 손실이 없는 성능을 달성할 수 있지만, 합성 회상과 같은 작업에서 도전에 직면한다는 것을 보여줍니다. 게다가, 우리는 세 가지 주요 실패 패턴을 식별합니다: 경계에서 손실, 놀람이 있으면 손실, 그리고 길을 따라 손실. 이러한 문제를 완화하기 위해, 우리는 두 가지 효과적인 전략을 제안합니다: 세밀한 자동 부호화, 이는 원래 토큰 정보의 재구성을 강화하며, 세그먼트별 토큰 중요도 추정, 이는 토큰 종속성에 기반한 최적화를 조정합니다. 우리의 연구는 기본 토큰 기반 컨텍스트 압축의 이해에 대한 소중한 통찰력을 제공하며, 압축 능력을 향상시키기 위한 실용적인 전략을 제시합니다.

English

In this work, we provide a thorough investigation of gist-based context compression methods to improve long-context processing in large language models. We focus on two key questions: (1) How well can these methods replace full attention models? and (2) What potential failure patterns arise due to compression? Through extensive experiments, we show that while gist-based compression can achieve near-lossless performance on tasks like retrieval-augmented generation and long-document QA, it faces challenges in tasks like synthetic recall. Furthermore, we identify three key failure patterns: lost by the boundary, lost if surprise, and lost along the way. To mitigate these issues, we propose two effective strategies: fine-grained autoencoding, which enhances the reconstruction of original token information, and segment-wise token importance estimation, which adjusts optimization based on token dependencies. Our work provides valuable insights into the understanding of gist token-based context compression and offers practical strategies for improving compression capabilities.

완전한 주의에 대한 실버 불릿 또는 타협? Gist 토큰 기반 컨텍스트 압축의 포괄적 연구

A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression

초록

Summary

Support

Support