利用LLM生成启发式函数的经典规划:以Python代码挑战现有技术
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code
March 24, 2025
作者: Augusto B. Corrêa, André G. Pereira, Jendrik Seipp
cs.AI
摘要
近年来,大型语言模型(LLMs)在各类人工智能问题上展现了卓越的能力。然而,即便在提供了详细规划任务定义的情况下,它们仍难以可靠地进行规划。尽管通过思维链提示、微调及显式“推理”等方式尝试提升其规划能力,生成的计划往往仍存在错误,且通常无法推广至更大规模的任务。本文展示了如何利用LLMs生成正确的计划,即便是面对规模不断增大的分布外任务。针对特定规划领域,我们要求LLM生成若干领域相关的启发式函数,以Python代码形式呈现,在贪婪最佳优先搜索框架下对一组训练任务进行评估,并选择其中最优者。由此产生的LLM生成启发式函数,在解决未见过的测试任务上,远超经典规划领域中的最先进领域无关启发式方法,甚至能与领域相关规划中最强的学习算法相媲美。这一发现尤为引人注目,因为我们的概念验证实现基于未经优化的Python规划器,而对比基线均建立在高度优化的C++代码之上。在某些领域,LLM生成的启发式函数扩展的状态数少于基线方法,表明它们不仅计算效率高,有时甚至比最先进的启发式函数更具信息量。总体而言,我们的研究结果表明,通过采样一组规划启发式函数程序,可以显著提升LLMs的规划能力。
English
In recent years, large language models (LLMs) have shown remarkable
capabilities in various artificial intelligence problems. However, they fail to
plan reliably, even when prompted with a detailed definition of the planning
task. Attempts to improve their planning capabilities, such as chain-of-thought
prompting, fine-tuning, and explicit "reasoning" still yield incorrect plans
and usually fail to generalize to larger tasks. In this paper, we show how to
use LLMs to generate correct plans, even for out-of-distribution tasks of
increasing size. For a given planning domain, we ask an LLM to generate several
domain-dependent heuristic functions in the form of Python code, evaluate them
on a set of training tasks within a greedy best-first search, and choose the
strongest one. The resulting LLM-generated heuristics solve many more unseen
test tasks than state-of-the-art domain-independent heuristics for classical
planning. They are even competitive with the strongest learning algorithm for
domain-dependent planning. These findings are especially remarkable given that
our proof-of-concept implementation is based on an unoptimized Python planner
and the baselines all build upon highly optimized C++ code. In some domains,
the LLM-generated heuristics expand fewer states than the baselines, revealing
that they are not only efficiently computable, but sometimes even more
informative than the state-of-the-art heuristics. Overall, our results show
that sampling a set of planning heuristic function programs can significantly
improve the planning capabilities of LLMs.Summary
AI-Generated Summary