InfiniPot:在內存受限的LLM上進行無限上下文處理
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
October 2, 2024
作者: Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang
cs.AI
摘要
處理長輸入內容對於大型語言模型(LLMs)仍然是一個重要挑戰,特別是在資源受限的環境中,如移動設備。我們的工作旨在通過引入InfiniPot來解決這一限制,這是一個新型KV快取控制框架,旨在使預訓練的LLMs能夠在固定內存限制下有效地管理廣泛序列,而無需額外的訓練。InfiniPot利用持續上下文蒸餾(CCD),這是一個迭代過程,通過新穎的重要性指標壓縮並保留必要信息,有效地保持關鍵數據,即使沒有未來上下文的訪問。我們的全面評估表明,InfiniPot在各種自然語言處理任務中明顯優於針對長上下文進行訓練的模型,確立了其效力和多功能性。這項工作代表了使LLMs應用於更廣泛實際情境的重大進展。
English
Handling long input contexts remains a significant challenge for Large
Language Models (LLMs), particularly in resource-constrained environments such
as mobile devices. Our work aims to address this limitation by introducing
InfiniPot, a novel KV cache control framework designed to enable pre-trained
LLMs to manage extensive sequences within fixed memory constraints efficiently,
without requiring additional training. InfiniPot leverages Continual Context
Distillation (CCD), an iterative process that compresses and retains essential
information through novel importance metrics, effectively maintaining critical
data even without access to future context. Our comprehensive evaluations
indicate that InfiniPot significantly outperforms models trained for long
contexts in various NLP tasks, establishing its efficacy and versatility. This
work represents a substantial advancement toward making LLMs applicable to a
broader range of real-world scenarios.Summary
AI-Generated Summary