OS-Genesis:通过反向任务合成自动化GUI代理轨迹构建

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

December 27, 2024
作者: Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu
cs.AI

摘要

由视觉-语言模型(VLMs)驱动的图形用户界面(GUI)代理展示了类似人类的计算机控制能力。尽管它们在推动数字自动化方面很有用,但一个关键瓶颈仍然存在:为训练收集高质量轨迹数据。通常用于收集此类数据的常见做法依赖于人工监督或通过执行预定义任务生成合成数据,这两种方法要么资源密集,要么无法保证数据质量。此外,这些方法受限于数据多样性有限以及合成数据与真实环境之间存在显著差距。为了解决这些挑战,我们提出了OS-Genesis,这是一个新颖的GUI数据合成流程,它颠覆了传统的轨迹收集过程。OS-Genesis不依赖于预定义任务,而是使代理能够首先感知环境并执行逐步交互,然后回顾性地推导出高质量任务,以实现轨迹级别的探索。然后采用轨迹奖励模型来确保生成轨迹的质量。我们证明,使用OS-Genesis训练GUI代理显著提高了它们在高度具有挑战性的在线基准测试中的性能。深入分析进一步验证了OS-Genesis的效率,以及与现有合成方法相比,其数据质量和多样性更优。我们的代码、数据和检查点可在https://qiushisun.github.io/OS-Genesis-Home/ {OS-Genesis主页} 上获得。
English
Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions, then retrospectively derive high-quality tasks to enable trajectory-level exploration. A trajectory reward model is then employed to ensure the quality of the generated trajectories. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks. In-depth analysis further validates OS-Genesis's efficiency and its superior data quality and diversity compared to existing synthesis methods. Our codes, data, and checkpoints are available at https://qiushisun.github.io/OS-Genesis-Home/{OS-Genesis Homepage}.

Summary

AI-Generated Summary

PDF793January 2, 2025