自导向语言模型

摘要

尽管测试时推理使语言模型能够应对复杂任务，但在自然语言中进行搜索或规划往往缓慢、成本高昂且容易出错。然而，即便语言模型难以精确模拟解决问题所需的推理步骤，它们通常擅长描述问题的抽象结构——包括如何验证解决方案以及如何搜索这些方案。本文介绍了一种名为DisCIPL的方法，用于实现语言模型的“自我引导”，其中规划模型生成特定任务的推理程序，由一组跟随模型执行。我们的方法赋予语言模型编写递归搜索程序的能力，这些程序指导语言模型的推理，从而实现了可验证且高效的新型推理形式。当采用小型跟随模型（如Llama-3.2-1B）实例化时，DisCIPL在具有挑战性的约束生成任务上，与包括GPT-4o和o1在内的更大模型表现相当，有时甚至更优。通过将规划与执行分离，我们的工作开辟了一个高度并行的蒙特卡洛推理策略设计空间，这些策略超越了标准的N选一采样，无需微调，且可由现有语言模型自动实现。

English

While test-time reasoning enables language models to tackle complex tasks, searching or planning in natural language can be slow, costly, and error-prone. But even when LMs struggle to emulate the precise reasoning steps needed to solve a problem, they often excel at describing its abstract structure--both how to verify solutions and how to search for them. This paper introduces DisCIPL, a method for "self-steering" LMs where a Planner model generates a task-specific inference program that is executed by a population of Follower models. Our approach equips LMs with the ability to write recursive search procedures that guide LM inference, enabling new forms of verifiable and efficient reasoning. When instantiated with a small Follower (e.g., Llama-3.2-1B), DisCIPL matches (and sometimes outperforms) much larger models, including GPT-4o and o1, on challenging constrained generation tasks. In decoupling planning from execution, our work opens up a design space of highly-parallelized Monte Carlo inference strategies that outperform standard best-of-N sampling, require no finetuning, and can be implemented automatically by existing LMs.

自导向语言模型

Self-Steering Language Models

摘要

Summary

Support

Support