LightThinker：逐步思考的压缩算法

摘要

大型语言模型（LLMs）在复杂推理任务中展现了卓越性能，但其效率因生成冗长标记所需的大量内存和计算成本而受限。本文提出了一种新颖方法——LightThinker，它使LLMs能够在推理过程中动态压缩中间思维。受人类认知过程启发，LightThinker将繁琐的思维步骤压缩为紧凑表示，并舍弃原始推理链，从而显著减少上下文窗口中存储的标记数量。这一目标通过数据构建训练模型何时及如何进行压缩、将隐藏状态映射至精简要点标记，以及创建专用注意力掩码来实现。此外，我们引入了依赖度（Dep）指标，通过衡量生成过程中对历史标记的依赖程度来量化压缩程度。在四个数据集和两种模型上的广泛实验表明，LightThinker在保持竞争力的准确率的同时，降低了峰值内存使用和推理时间。我们的工作为在不牺牲性能的前提下提升LLMs在复杂推理任务中的效率提供了新方向。代码将发布于https://github.com/zjunlp/LightThinker。

English

Large language models (LLMs) have shown remarkable performance in complex reasoning tasks, but their efficiency is hindered by the substantial memory and computational costs associated with generating lengthy tokens. In this paper, we propose LightThinker, a novel method that enables LLMs to dynamically compress intermediate thoughts during reasoning. Inspired by human cognitive processes, LightThinker compresses verbose thought steps into compact representations and discards the original reasoning chains, thereby significantly reducing the number of tokens stored in the context window. This is achieved by training the model on when and how to perform compression through data construction, mapping hidden states to condensed gist tokens, and creating specialized attention masks. Additionally, we introduce the Dependency (Dep) metric to quantify the degree of compression by measuring the reliance on historical tokens during generation. Extensive experiments on four datasets and two models show that LightThinker reduces peak memory usage and inference time, while maintaining competitive accuracy. Our work provides a new direction for improving the efficiency of LLMs in complex reasoning tasks without sacrificing performance. Code will be released at https://github.com/zjunlp/LightThinker.

LightThinker：逐步思考的压缩算法

LightThinker: Thinking Step-by-Step Compression

摘要

Summary

Support