邁向LLM中的系統2推理:學習如何透過元連貫思考。
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though
January 8, 2025
作者: Violet Xiang, Charlie Snell, Kanishk Gandhi, Alon Albalak, Anikait Singh, Chase Blagden, Duy Phung, Rafael Rafailov, Nathan Lile, Dakota Mahan, Louis Castricato, Jan-Philipp Franken, Nick Haber, Chelsea Finn
cs.AI
摘要
我們提出了一個新穎的框架,稱為「元思維鏈」(Meta-CoT),它擴展了傳統的思維鏈(CoT),通過明確地對到達特定思維鏈所需的基礎推理進行建模。我們提出了來自最先進模型的實證證據,展示了符合上下文搜索的行為,並探索通過過程監督、合成數據生成和搜索算法來生成元思維鏈的方法。最後,我們概述了一個具體的流程,用於訓練一個模型來生成元思維鏈,包括將指導調整與線性化搜索軌跡和訓練後的強化學習相結合。最後,我們討論了一些開放性研究問題,包括擴展定律、驗證者角色以及發現新型推理算法的潛力。這項工作提供了一個理論和實踐路線圖,以實現在大型語言模型中的元思維鏈,為人工智能中更強大和更接近人類推理的可能性鋪平了道路。
English
We propose a novel framework, Meta Chain-of-Thought (Meta-CoT), which extends
traditional Chain-of-Thought (CoT) by explicitly modeling the underlying
reasoning required to arrive at a particular CoT. We present empirical evidence
from state-of-the-art models exhibiting behaviors consistent with in-context
search, and explore methods for producing Meta-CoT via process supervision,
synthetic data generation, and search algorithms. Finally, we outline a
concrete pipeline for training a model to produce Meta-CoTs, incorporating
instruction tuning with linearized search traces and reinforcement learning
post-training. Finally, we discuss open research questions, including scaling
laws, verifier roles, and the potential for discovering novel reasoning
algorithms. This work provides a theoretical and practical roadmap to enable
Meta-CoT in LLMs, paving the way for more powerful and human-like reasoning in
artificial intelligence.Summary
AI-Generated Summary