紧凑的思维链:通过密集表示实现高效推理

Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

December 17, 2024
作者: Jeffrey Cheng, Benjamin Van Durme
cs.AI

摘要

思维链(CoT)解码使语言模型能够在推理性能上取得改进,但代价是解码过程中生成延迟较高。最近的提议探讨了沉思令牌的变体,这是我们引入的一个术语,用于指代推理过程中用于允许额外计算的特殊令牌。先前的研究考虑了从离散嵌入集合中提取的固定长度序列作为沉思令牌。在这里,我们提出了压缩思维链(CCoT),这是一个框架,用于生成内容丰富且连续的可变长度沉思令牌。生成的沉思令牌是明确推理链的压缩表示,我们的方法可以应用于现成的解码器语言模型。通过实验,我们阐明了CCoT如何使得额外推理能够在密集内容丰富的表示上实现相应的准确性改进。此外,推理改进可以通过控制生成的沉思令牌数量来灵活调整。
English
Chain-of-thought (CoT) decoding enables language models to improve reasoning performance at the cost of high generation latency in decoding. Recent proposals have explored variants of contemplation tokens, a term we introduce that refers to special tokens used during inference to allow for extra computation. Prior work has considered fixed-length sequences drawn from a discrete set of embeddings as contemplation tokens. Here we propose Compressed Chain-of-Thought (CCoT), a framework to generate contentful and continuous contemplation tokens of variable sequence length. The generated contemplation tokens are compressed representations of explicit reasoning chains, and our method can be applied to off-the-shelf decoder language models. Through experiments, we illustrate how CCoT enables additional reasoning over dense contentful representations to achieve corresponding improvements in accuracy. Moreover, the reasoning improvements can be adaptively modified on demand by controlling the number of contemplation tokens generated.

Summary

AI-Generated Summary

PDF312December 18, 2024