通用推理能力需从初始阶段就学习如何推理
General Reasoning Requires Learning to Reason from the Get-go
February 26, 2025
作者: Seungwook Han, Jyothish Pari, Samuel J. Gershman, Pulkit Agrawal
cs.AI
摘要
大型语言模型(LLMs)已展现出显著的现实应用价值,体现了人工智能实用智能(AUI)。然而,它们在适应性和鲁棒性推理方面的能力——即人工通用智能(AGI)的标志——仍显脆弱。尽管LLMs在常识推理、编程和数学领域看似成功,但在跨新情境推广算法理解方面却面临挑战。我们通过对冷门编程语言中算法任务的实验发现,LLM的推理过程过度拟合训练数据,其迁移能力受限。我们推测,这种有限迁移性的核心问题在于LLM中推理与知识的紧密耦合。
为从AUI迈向AGI,我们提出通过三个关键方向解耦知识与推理:(1)采用从零开始的强化学习(RL)进行预训练,替代广泛使用的下一词预测预训练;(2)利用合成任务课程,简化RL推理先验的学习,进而迁移至自然语言任务;(3)通过小上下文窗口学习更具泛化性的推理函数,减少对词间虚假相关性的依赖。这种推理系统与训练有素的检索系统及作为知识库的大型外部记忆库相结合,能够克服现有架构在应对新场景推理学习时的多项局限。
English
Large Language Models (LLMs) have demonstrated impressive real-world utility,
exemplifying artificial useful intelligence (AUI). However, their ability to
reason adaptively and robustly -- the hallmarks of artificial general
intelligence (AGI) -- remains fragile. While LLMs seemingly succeed in
commonsense reasoning, programming, and mathematics, they struggle to
generalize algorithmic understanding across novel contexts. Our experiments
with algorithmic tasks in esoteric programming languages reveal that LLM's
reasoning overfits to the training data and is limited in its transferability.
We hypothesize that the core issue underlying such limited transferability is
the coupling of reasoning and knowledge in LLMs.
To transition from AUI to AGI, we propose disentangling knowledge and
reasoning through three key directions: (1) pretaining to reason using RL from
scratch as an alternative to the widely used next-token prediction pretraining,
(2) using a curriculum of synthetic tasks to ease the learning of a
reasoning prior for RL that can then be transferred to natural
language tasks, and (3) learning more generalizable reasoning functions using a
small context window to reduce exploiting spurious correlations between tokens.
Such a reasoning system coupled with a trained retrieval system and a large
external memory bank as a knowledge store can overcome several limitations of
existing architectures at learning to reason in novel scenarios.Summary
AI-Generated Summary