壓縮思維鏈:透過密集表示進行高效推理

Compressed Chain of Thought: Efficient Reasoning Through Dense Representations

December 17, 2024
作者: Jeffrey Cheng, Benjamin Van Durme
cs.AI

摘要

思維鏈 (CoT) 解碼使語言模型能夠提高推理性能,但解碼過程中會產生高延遲。最近的提議探索了冥想標記的變體,這是我們提出的一個術語,用於推斷過程中使用特殊標記以允許額外計算。先前的工作考慮了從離散嵌入集合中繪製的固定長度序列作為冥想標記。在這裡,我們提出了壓縮思維鏈 (CCoT),這是一個框架,用於生成具有可變序列長度的內容豐富且連續的冥想標記。生成的冥想標記是明確推理鏈的壓縮表示,我們的方法可應用於現成的解碼器語言模型。通過實驗,我們說明了CCoT如何使得在密集內容豐富表示上進行額外推理,從而實現相應的準確性改進。此外,推理改進可以通過控制生成的冥想標記數量來適應性地進行修改。
English
Chain-of-thought (CoT) decoding enables language models to improve reasoning performance at the cost of high generation latency in decoding. Recent proposals have explored variants of contemplation tokens, a term we introduce that refers to special tokens used during inference to allow for extra computation. Prior work has considered fixed-length sequences drawn from a discrete set of embeddings as contemplation tokens. Here we propose Compressed Chain-of-Thought (CCoT), a framework to generate contentful and continuous contemplation tokens of variable sequence length. The generated contemplation tokens are compressed representations of explicit reasoning chains, and our method can be applied to off-the-shelf decoder language models. Through experiments, we illustrate how CCoT enables additional reasoning over dense contentful representations to achieve corresponding improvements in accuracy. Moreover, the reasoning improvements can be adaptively modified on demand by controlling the number of contemplation tokens generated.

Summary

AI-Generated Summary

PDF312December 18, 2024