APIGen-MT：基于模拟人机交互的多轮数据生成智能管道

摘要

训练高效的多轮交互AI智能体，需要捕捉真实人机互动动态的高质量数据，然而这类数据稀缺且手动收集成本高昂。我们推出了APIGen-MT，一个两阶段框架，用于生成可验证且多样化的多轮智能体数据。在第一阶段，我们的智能体管道通过利用LLM评审委员会和迭代反馈循环，生成包含真实动作的详细任务蓝图。随后，这些蓝图通过模拟人机互动转化为完整的交互轨迹。我们训练了一系列模型——xLAM-2-fc-r系列，参数规模从1B到70B不等。我们的模型在tau-bench和BFCL基准测试中超越了GPT-4o和Claude 3.5等前沿模型，其中较小模型在多轮设置下尤其超越其更大版本，同时在多次试验中保持卓越的一致性。全面实验证明，我们经过验证的蓝图到细节方法生成了高质量的训练数据，促进了更可靠、高效和能干的智能体开发。我们开源了收集的合成数据及训练好的xLAM-2-fc-r模型，以推动AI智能体研究。模型可在HuggingFace获取，地址为https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4，项目网站为https://apigen-mt.github.io。

English

Training effective AI agents for multi-turn interactions requires high-quality data that captures realistic human-agent dynamics, yet such data is scarce and expensive to collect manually. We introduce APIGen-MT, a two-phase framework that generates verifiable and diverse multi-turn agent data. In the first phase, our agentic pipeline produces detailed task blueprints with ground-truth actions, leveraging a committee of LLM reviewers and iterative feedback loops. These blueprints are then transformed into complete interaction trajectories through simulated human-agent interplay. We train a family of models -- the xLAM-2-fc-r series with sizes ranging from 1B to 70B parameters. Our models outperform frontier models such as GPT-4o and Claude 3.5 on tau-bench and BFCL benchmarks, with the smaller models surpassing their larger counterparts, particularly in multi-turn settings, while maintaining superior consistency across multiple trials. Comprehensive experiments demonstrate that our verified blueprint-to-details approach yields high-quality training data, enabling the development of more reliable, efficient, and capable agents. We open-source both the synthetic data collected and the trained xLAM-2-fc-r models to advance research in AI agents. Models are available on HuggingFace at https://huggingface.co/collections/Salesforce/xlam-2-67ef5be12949d8dcdae354c4 and project website is https://apigen-mt.github.io

APIGen-MT：基于模拟人机交互的多轮数据生成智能管道

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

摘要

Summary

Support

Support