走向互联网规模的智能体训练
Towards Internet-Scale Training For Agents
February 10, 2025
作者: Brandon Trabucco, Gunnar Sigurdsson, Robinson Piramuthu, Ruslan Salakhutdinov
cs.AI
摘要
训练网络导航代理的主要方法是收集一系列热门网站和手写任务的人类演示,但明显地人类数据是一种低效的资源。我们开发了一个流水线,以促进代理的互联网规模训练,无需费力的人类注释。在第一阶段,一个LLM为150k个不同的网站生成任务。接下来,LLM代理完成任务并生成轨迹。最后,一个LLM审查轨迹并评判其成功。语言模型与人类注释者相媲美,以97%的准确率检测和过滤有害内容,以89%的速率生成可行任务,并以82.6%的准确率判断成功的轨迹。通过扩展流水线,基于Llama 3.1 70B的代理解决了150k个网站的16.7%的任务。在我们的流水线生成的数据上进行训练与在人类演示上进行训练具有竞争力。在来自Mind2Web和WebLINX的数据有限的情况下,我们将代理在混合我们流水线数据和人类数据上训练时的步骤准确率分别提高了高达+89.5%和+122.1%。当使用来自这些基准测试的所有可用人类数据训练代理时,代理无法推广到不同的真实网站,而添加我们的数据使其在WebLINX上的泛化能力提高了+149.0%,在Mind2Web上提高了+156.3%。代码将在以下网址提供:data-for-agents.github.io。
English
The predominant approach for training web navigation agents gathers human
demonstrations for a set of popular websites and hand-written tasks, but it is
becoming clear that human data are an inefficient resource. We develop a
pipeline to facilitate Internet-scale training for agents without laborious
human annotations. In the first stage, an LLM generates tasks for 150k diverse
websites. In the next stage, LLM agents complete tasks and produce
trajectories. In the final stage, an LLM reviews the trajectories and judges
their success. Language models are competitive with human annotators, detecting
and filtering out harmful content with an accuracy of 97%, generating feasible
tasks with an 89% rate, and judging successful trajectories with an 82.6%
accuracy. Scaling the pipeline, agents based on Llama 3.1 70B solve 16.7% of
tasks for 150k sites. Training on the data generated by our pipeline is
competitive with training on human demonstrations. In data-limited settings
derived from Mind2Web and WebLINX, we improve Step Accuracy by up to +89.5% and
+122.1% respectively for agents trained on mixtures of data from our pipeline,
and human data. When training agents with all available human data from these
benchmarks, agents fail to generalize to diverse real sites, and adding our
data improves their generalization by +149.0% for WebLINX and +156.3% for
Mind2Web. Code will be available at: data-for-agents.github.io.Summary
AI-Generated Summary