在连续潜空间中训练大型语言模型进行推理

摘要

大型语言模型（LLMs）被限制在“语言空间”中进行推理，在这里它们通常使用一种“思维链”（CoT）来表达解决复杂推理问题的过程。然而，我们认为语言空间并非始终是推理的最佳选择。例如，大多数词元主要用于文本连贯性而非推理所必需，而一些关键词元需要复杂规划并对LLMs构成巨大挑战。为了探索LLM在无限制潜在空间中进行推理的潜力，而非使用自然语言，我们引入了一种新范式椰子（Chain of Continuous Thought）。我们利用LLM的最后隐藏状态作为推理状态的表示（称为“连续思维”）。我们不将其解码为词元，而是将其直接反馈给LLM作为连续空间中的后续输入嵌入。实验证明，椰子可以有效地增强LLM在几个推理任务上的表现。这种新颖的潜在推理范式导致新兴的高级推理模式：连续思维可以编码多个替代的下一推理步骤，使模型能够执行广度优先搜索（BFS）来解决问题，而非像CoT那样过早地承诺单一确定的路径。在某些需要在规划过程中进行大量回溯的逻辑推理任务中，椰子在推理过程中的思考词元更少，优于CoT。这些发现展示了潜在推理的潜力，并为未来研究提供了宝贵的见解。

English

Large language models (LLMs) are restricted to reason in the "language space", where they typically express the reasoning process with a chain-of-thought (CoT) to solve a complex reasoning problem. However, we argue that language space may not always be optimal for reasoning. For example, most word tokens are primarily for textual coherence and not essential for reasoning, while some critical tokens require complex planning and pose huge challenges to LLMs. To explore the potential of LLM reasoning in an unrestricted latent space instead of using natural language, we introduce a new paradigm Coconut (Chain of Continuous Thought). We utilize the last hidden state of the LLM as a representation of the reasoning state (termed "continuous thought"). Rather than decoding this into a word token, we feed it back to the LLM as the subsequent input embedding directly in the continuous space. Experiments show that Coconut can effectively augment the LLM on several reasoning tasks. This novel latent reasoning paradigm leads to emergent advanced reasoning patterns: the continuous thought can encode multiple alternative next reasoning steps, allowing the model to perform a breadth-first search (BFS) to solve the problem, rather than prematurely committing to a single deterministic path like CoT. Coconut outperforms CoT in certain logical reasoning tasks that require substantial backtracking during planning, with fewer thinking tokens during inference. These findings demonstrate the promise of latent reasoning and offer valuable insights for future research.

在连续潜空间中训练大型语言模型进行推理

Training Large Language Models to Reason in a Continuous Latent Space

摘要

Summary

Support

Support