ChatPaper.aiChatPaper

Transformer中的隐式推理是通过捷径进行的推理。

Implicit Reasoning in Transformers is Reasoning through Shortcuts

March 10, 2025
作者: Tianhe Lin, Jian Xie, Siyu Yuan, Deqing Yang
cs.AI

摘要

测试时计算正成为一种新兴范式,用于增强语言模型在复杂多步推理任务中的表现,这一点在OpenAI的o1和o3模型以及DeepSeek的R1模型取得的成功中得到了验证。与测试时计算中的显式推理相比,隐式推理在推理效率上更具优势,所需生成的标记更少。然而,为何这种高级推理能力在隐式推理风格中未能显现?在本研究中,我们从头训练GPT-2模型,使用精心挑选的多步数学推理数据集,并通过分析性实验探讨语言模型如何在多步任务中执行隐式推理。我们的发现揭示:1)语言模型能够通过隐式推理进行逐步推理,并在领域内及跨领域测试中达到高准确率,但这一能力仅在固定模式数据训练下显现。2)相反,基于非固定模式数据训练所获得的隐式推理能力,往往过度拟合特定模式,难以进一步泛化。值得注意的是,这一局限在当今最先进的大型语言模型中也同样存在。这些发现表明,语言模型通过捷径学习获得隐式推理能力,使其在相似模式任务上表现强劲,却缺乏泛化能力。
English
Test-time compute is emerging as a new paradigm for enhancing language models' complex multi-step reasoning capabilities, as demonstrated by the success of OpenAI's o1 and o3, as well as DeepSeek's R1. Compared to explicit reasoning in test-time compute, implicit reasoning is more inference-efficient, requiring fewer generated tokens. However, why does the advanced reasoning capability fail to emerge in the implicit reasoning style? In this work, we train GPT-2 from scratch on a curated multi-step mathematical reasoning dataset and conduct analytical experiments to investigate how language models perform implicit reasoning in multi-step tasks. Our findings reveal: 1) Language models can perform step-by-step reasoning and achieve high accuracy in both in-domain and out-of-domain tests via implicit reasoning. However, this capability only emerges when trained on fixed-pattern data. 2) Conversely, implicit reasoning abilities emerging from training on unfixed-pattern data tend to overfit a specific pattern and fail to generalize further. Notably, this limitation is also observed in state-of-the-art large language models. These findings suggest that language models acquire implicit reasoning through shortcut learning, enabling strong performance on tasks with similar patterns while lacking generalization.

Summary

AI-Generated Summary

PDF182March 12, 2025