MPO: Boosting LLM Agents with Meta Plan Optimization
Abstract
Recent advancements in large language models (LLMs) have enabled LLM-based agents to successfully tackle interactive planning tasks. However, despite their successes, existing approaches often suffer from planning hallucinations and require retraining for each new agent. To address these challenges, we propose the Meta Plan Optimization (MPO) framework, which enhances agent planning capabilities by directly incorporating explicit guidance. Unlike previous methods that rely on complex knowledge, which either require significant human effort or lack quality assurance, MPO leverages high-level general guidance through meta plans to assist agent planning and enables continuous optimization of the meta plans based on feedback from the agent's task execution. Our experiments conducted on two representative tasks demonstrate that MPO significantly outperforms existing baselines. Moreover, our analysis indicates that MPO provides a plug-and-play solution that enhances both task completion efficiency and generalization capabilities in previous unseen scenarios.
Summary
AI-Generated Summary
Paper Overview
Core Contribution
- Introduces MetaPlanOptimization (MPO) framework to enhance LLM agent planning capabilities.
- Provides a plug-and-play solution for improving task completion efficiency and generalization.
- Leverages high-level meta plans for explicit guidance and continuous optimization.
Research Context
- Addresses planning hallucinations and retraining challenges in LLM-based agents.
- Builds on prior work in implicit planning and explicit knowledge-guided approaches.
Keywords
- Meta Plan Optimization (MPO)
- Large Language Models (LLMs)
- Planning Hallucination
- Task Completion Efficiency
- Generalization
Background
Research Gap
- Existing methods suffer from planning hallucinations and require retraining for new agents.
- Lack of high-quality, generalizable explicit guidance for agent planning.
Technical Challenges
- Ensuring meta plan quality and adaptability across diverse tasks.
- Integrating meta plans into agent workflows without disrupting reasoning.
Prior Approaches
- Implicit planning methods (e.g., ReAct, Reflexion).
- Explicit knowledge-guided approaches (e.g., KnowAgent, WKM).
- Trajectory tuning methods (e.g., AgentTuning, ETO).
Methodology
Technical Architecture
- Meta Planner: Generates high-level meta plans.
- Agent: Executes tasks and provides feedback for meta plan optimization.
Implementation Details
- Supervised Fine-Tuning (SFT) for meta planner initialization.
- Monte Carlo (MC) sampling for meta plan quality evaluation.
- Direct Preference Optimization (DPO) for meta planner refinement.
Innovation Points
- Decouples meta plans from specific environmental details.
- Enables continuous optimization of meta plans based on agent feedback.
- Provides a plug-and-play solution compatible with existing frameworks.
Results
Experimental Setup
- Benchmarks: ALFWorld (embodied household tasks) and ScienceWorld (textual science experiment tasks).
- Base Models: GPT-4o, Llama-3.1-8B-Instruct, Qwen2.5-7B-Instruct, etc.
Key Findings
- MPO significantly outperforms baselines, with up to 100% performance improvement.
- Enhances task completion efficiency and generalization in unseen scenarios.
- Compatible with various agent frameworks, delivering additional performance gains.
Limitations
- Meta planner initialization relies on GPT-4o, which may introduce biases.
- Limited exploration of lightweight models for meta planner construction.
- Sampling methods for DPO training could be further optimized.