揭示語言代理人在規劃中的障礙

摘要

自主規劃是自從人工智慧誕生以來一直在追求的目標。早期的規劃代理基於經過精心挑選的問題解決者，可以為特定任務提供精確的解決方案，但缺乏泛化能力。大型語言模型（LLMs）的出現及其強大的推理能力重新激起了對自主規劃的興趣，因為它們可以自動為給定任務生成合理的解決方案。然而，先前的研究和我們的實驗表明，當前的語言代理仍然缺乏人類水平的規劃能力。即使是最先進的推理模型OpenAI o1，在複雜的現實世界規劃基準中也僅達到15.6％。這突顯了一個關鍵問題：是什麼阻礙了語言代理實現人類水平的規劃？儘管現有研究已經強調了代理規劃的表現不佳，但對於潛在的更深層次問題以及旨在解決這些問題的策略的機制和限制仍然了解不足。在這項工作中，我們應用特徵歸因研究，確定了阻礙代理規劃的兩個關鍵因素：約束的有限作用和問題影響力的減弱。我們還發現，儘管當前的策略有助於緩解這些挑戰，但並未完全解決它們，這表明代理在達到人類水平智能之前還有很長的路要走。

English

Autonomous planning has been an ongoing pursuit since the inception of artificial intelligence. Based on curated problem solvers, early planning agents could deliver precise solutions for specific tasks but lacked generalization. The emergence of large language models (LLMs) and their powerful reasoning capabilities has reignited interest in autonomous planning by automatically generating reasonable solutions for given tasks. However, prior research and our experiments show that current language agents still lack human-level planning abilities. Even the state-of-the-art reasoning model, OpenAI o1, achieves only 15.6% on one of the complex real-world planning benchmarks. This highlights a critical question: What hinders language agents from achieving human-level planning? Although existing studies have highlighted weak performance in agent planning, the deeper underlying issues and the mechanisms and limitations of the strategies proposed to address them remain insufficiently understood. In this work, we apply the feature attribution study and identify two key factors that hinder agent planning: the limited role of constraints and the diminishing influence of questions. We also find that although current strategies help mitigate these challenges, they do not fully resolve them, indicating that agents still have a long way to go before reaching human-level intelligence.

揭示語言代理人在規劃中的障礙

Revealing the Barriers of Language Agents in Planning

摘要

Summary

Support

Support