通过对大型语言模型进行测试时间缩放生成符号世界模型。

摘要

解决复杂规划问题需要大型语言模型（LLMs）明确建模状态转移，以避免规则违反，遵守约束条件，并确保最优性 - 这是一项受自然语言固有歧义影响的任务。为了克服这种歧义，规划领域定义语言（PDDL）被利用作为一种规划抽象，能够实现精确和正式的状态描述。利用PDDL，我们可以生成一个符号世界模型，经典搜索算法（如A*）可以无缝应用于找到最优规划。然而，由于缺乏PDDL训练数据，直接利用当前的LLMs生成PDDL领域仍然是一个挑战。为了解决这一挑战，我们提出扩大LLMs的测试时间计算，增强其PDDL推理能力，从而实现高质量PDDL领域的生成。具体而言，我们引入了一种简单而有效的算法，首先采用最佳N采样方法改善初始解的质量，然后通过口头机器学习精细化改进解决方案。我们的方法在生成PDDL领域方面远远优于o1-mini，两项任务（即从自然语言描述或PDDL问题生成PDDL领域）的成功率超过50％，而无需额外训练。通过利用PDDL作为状态抽象，我们的方法能够在几乎所有竞赛级规划任务中胜过当前最先进的方法。

English

Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality-a task hindered by the inherent ambiguity of natural language. To overcome such ambiguity, Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions. With PDDL, we can generate a symbolic world model where classic searching algorithms, such as A*, can be seamlessly applied to find optimal plans. However, directly generating PDDL domains with current LLMs remains an open challenge due to the lack of PDDL training data. To address this challenge, we propose to scale up the test-time computation of LLMs to enhance their PDDL reasoning capabilities, thereby enabling the generation of high-quality PDDL domains. Specifically, we introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution and then refines the solution in a fine-grained manner with verbalized machine learning. Our method outperforms o1-mini by a considerable margin in the generation of PDDL domain, achieving over 50% success rate on two tasks (i.e., generating PDDL domains from natural language description or PDDL problems). This is done without requiring additional training. By taking advantage of PDDL as state abstraction, our method is able to outperform current state-of-the-art methods on almost all competition-level planning tasks.

通过对大型语言模型进行测试时间缩放生成符号世界模型。

Generating Symbolic World Models via Test-time Scaling of Large Language Models

摘要

Summary

Support