대규모 언어 모델의 테스트 시간 스케일링을 통해 상징적 세계 모델 생성하기

초록

복잡한 계획 문제를 해결하기 위해서는 대규모 언어 모델(Large Language Models, LLMs)이 규칙 위반을 피하고 제약 조건을 준수하며 최적성을 보장하기 위해 상태 전이를 명시적으로 모델링해야 합니다. 이는 자연어의 내재적 모호성으로 인해 어려운 작업입니다. 이러한 모호성을 극복하기 위해 계획 도메인 정의 언어(Planning Domain Definition Language, PDDL)가 계획 추상화로 활용되어 정확하고 형식적인 상태 설명을 가능하게 합니다. PDDL을 사용하면 기호적 세계 모델을 생성할 수 있어 A*와 같은 고전적인 탐색 알고리즘을 적용하여 최적 계획을 찾을 수 있습니다. 그러나 현재 LLMs로 직접 PDDL 도메인을 생성하는 것은 PDDL 훈련 데이터 부족으로 인해 여전히 열린 과제입니다. 이러한 도전에 대처하기 위해 우리는 LLMs의 테스트 시간 계산을 확장하여 그들의 PDDL 추론 능력을 향상시키고 높은 품질의 PDDL 도메인을 생성할 수 있도록 제안합니다. 구체적으로, 우리는 초기 솔루션의 품질을 향상시키기 위해 Best-of-N 샘플링 접근법을 먼저 채택하고 그 후에 문제를 세부적으로 다듬는 방식으로 솔루션을 개선하는 간단하면서 효과적인 알고리즘을 소개합니다. 우리의 방법은 PDDL 도메인 생성에서 o1-mini를 크게 앞섭니다. 자연어 설명이나 PDDL 문제로부터 PDDL 도메인을 생성하는 두 가지 작업에서 50% 이상의 성공률을 달성합니다. 이는 추가 훈련이 필요하지 않은 상태에서 이루어집니다. 상태 추상화로서의 PDDL을 활용함으로써 우리의 방법은 대부분의 경쟁 수준의 계획 작업에서 현재 최첨단 방법을 앞서 나갈 수 있습니다.

English

Solving complex planning problems requires Large Language Models (LLMs) to explicitly model the state transition to avoid rule violations, comply with constraints, and ensure optimality-a task hindered by the inherent ambiguity of natural language. To overcome such ambiguity, Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions. With PDDL, we can generate a symbolic world model where classic searching algorithms, such as A*, can be seamlessly applied to find optimal plans. However, directly generating PDDL domains with current LLMs remains an open challenge due to the lack of PDDL training data. To address this challenge, we propose to scale up the test-time computation of LLMs to enhance their PDDL reasoning capabilities, thereby enabling the generation of high-quality PDDL domains. Specifically, we introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution and then refines the solution in a fine-grained manner with verbalized machine learning. Our method outperforms o1-mini by a considerable margin in the generation of PDDL domain, achieving over 50% success rate on two tasks (i.e., generating PDDL domains from natural language description or PDDL problems). This is done without requiring additional training. By taking advantage of PDDL as state abstraction, our method is able to outperform current state-of-the-art methods on almost all competition-level planning tasks.

대규모 언어 모델의 테스트 시간 스케일링을 통해 상징적 세계 모델 생성하기

Generating Symbolic World Models via Test-time Scaling of Large Language Models

초록

Support