OctoTools:一个具备可扩展工具集的智能体框架,用于复杂推理任务
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning
February 16, 2025
作者: Pan Lu, Bowen Chen, Sheng Liu, Rahul Thapa, Joseph Boen, James Zou
cs.AI
摘要
解决复杂的推理任务可能涉及视觉理解、领域知识检索、数值计算以及多步推理。现有方法通过为大型语言模型(LLMs)配备外部工具进行增强,但这些方法通常局限于特定领域、工具类型有限或需要额外的训练数据。本文提出了OctoTools,一个无需训练、用户友好且易于扩展的开源代理框架,旨在跨多个领域处理复杂推理任务。OctoTools引入了标准化的工具卡片来封装工具功能,一个用于高层和底层规划的规划器,以及一个执行工具使用的执行器。我们在16项多样化任务(包括MathVista、MMLU-Pro、MedQA和GAIA-Text)上验证了OctoTools的通用性,相较于GPT-4o实现了平均准确率9.3%的显著提升。此外,在提供相同工具集的情况下,OctoTools的表现优于AutoGen、GPT-Functions和LangChain,最高提升达10.6%。通过全面的分析和消融实验,OctoTools在任务规划、有效工具使用和多步问题解决方面展现了其优势。
English
Solving complex reasoning tasks may involve visual understanding, domain
knowledge retrieval, numerical calculation, and multi-step reasoning. Existing
methods augment large language models (LLMs) with external tools but are
restricted to specialized domains, limited tool types, or require additional
training data. In this paper, we introduce OctoTools, a training-free,
user-friendly, and easily extensible open-source agentic framework designed to
tackle complex reasoning across diverse domains. OctoTools introduces
standardized tool cards to encapsulate tool functionality, a planner for both
high-level and low-level planning, and an executor to carry out tool usage. We
validate OctoTools' generality across 16 diverse tasks (including MathVista,
MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains
of 9.3% over GPT-4o. Furthermore, OctoTools outperforms AutoGen, GPT-Functions
and LangChain by up to 10.6% when given the same set of tools. Through
comprehensive analysis and ablations, OctoTools demonstrates advantages in task
planning, effective tool usage, and multi-step problem solving.Summary
AI-Generated Summary