自我導向語言模型

摘要

儘管測試時推理使語言模型能夠處理複雜任務，但在自然語言中進行搜索或規劃可能既緩慢、成本高昂又容易出錯。然而，即使語言模型難以精確模擬解決問題所需的推理步驟，它們通常擅長描述問題的抽象結構——包括如何驗證解決方案以及如何搜索這些方案。本文介紹了DisCIPL，一種“自我引導”語言模型的方法，其中規劃模型生成特定任務的推理程序，並由一組跟隨模型執行。我們的方法賦予語言模型編寫遞歸搜索程序的能力，這些程序指導語言模型的推理，從而實現了新的可驗證且高效的推理形式。當使用小型跟隨模型（例如Llama-3.2-1B）實例化時，DisCIPL在具有挑戰性的約束生成任務上與（有時甚至超越）包括GPT-4o和o1在內的更大模型相媲美。通過將規劃與執行分離，我們的工作開闢了一個高度並行化的蒙特卡洛推理策略設計空間，這些策略優於標準的N選一採樣，無需微調，並且可以由現有的語言模型自動實現。

English

While test-time reasoning enables language models to tackle complex tasks, searching or planning in natural language can be slow, costly, and error-prone. But even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure--both how to verify solutions and how to search for them. This paper introduces DisCIPL, a method for "self-steering" LMs where a Planner model generates a task-specific inference program that is executed by a population of Follower models. Our approach equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning. When instantiated with a small Follower (e.g., Llama-3.2-1B), DisCIPL matches (and sometimes outperforms) much larger models, including GPT-4o and o1, on challenging constrained generation tasks. In decoupling planning from execution, our work opens up a design space of highly-parallelized Monte Carlo inference strategies that outperform standard best-of-N sampling, require no finetuning, and can be implemented automatically by existing LMs.

自我導向語言模型

Self-Steering Language Models

摘要

Summary

Support

Support