OS-Genesis:透過反向任務合成自動化GUI代理軌跡建構

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

December 27, 2024
作者: Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu
cs.AI

摘要

由視覺語言模型(VLMs)驅動的圖形使用者介面(GUI)代理已展示出類似人類的電腦控制能力。儘管它們在推動數位自動化方面具有實用性,但一個關鍵瓶頸仍然存在:為訓練收集高質量軌跡數據。常見的收集此類數據的做法依賴人類監督或通過執行預定義任務生成合成數據,這兩者都要么耗費大量資源,要么無法保證數據質量。此外,這些方法受限於數據多樣性不足以及合成數據與真實環境之間存在顯著差距。為應對這些挑戰,我們提出了OS-Genesis,一種新穎的GUI數據合成流程,逆轉了傳統的軌跡收集過程。OS-Genesis不依賴於預定義任務,而是使代理能夠首先感知環境並進行逐步交互,然後回顧性地推導出高質量任務,以實現軌跡級探索。然後採用軌跡獎勵模型來確保生成軌跡的質量。我們證明使用OS-Genesis訓練GUI代理顯著提高了它們在高度具有挑戰性的在線基準測試中的表現。深入分析進一步驗證了OS-Genesis的效率以及與現有合成方法相比其卓越的數據質量和多樣性。我們的代碼、數據和檢查點可在https://qiushisun.github.io/OS-Genesis-Home/ {OS-Genesis 主頁} 上找到。
English
Graphical User Interface (GUI) agents powered by Vision-Language Models (VLMs) have demonstrated human-like computer control capability. Despite their utility in advancing digital automation, a critical bottleneck persists: collecting high-quality trajectory data for training. Common practices for collecting such data rely on human supervision or synthetic data generation through executing pre-defined tasks, which are either resource-intensive or unable to guarantee data quality. Moreover, these methods suffer from limited data diversity and significant gaps between synthetic data and real-world environments. To address these challenges, we propose OS-Genesis, a novel GUI data synthesis pipeline that reverses the conventional trajectory collection process. Instead of relying on pre-defined tasks, OS-Genesis enables agents first to perceive environments and perform step-wise interactions, then retrospectively derive high-quality tasks to enable trajectory-level exploration. A trajectory reward model is then employed to ensure the quality of the generated trajectories. We demonstrate that training GUI agents with OS-Genesis significantly improves their performance on highly challenging online benchmarks. In-depth analysis further validates OS-Genesis's efficiency and its superior data quality and diversity compared to existing synthesis methods. Our codes, data, and checkpoints are available at https://qiushisun.github.io/OS-Genesis-Home/{OS-Genesis Homepage}.

Summary

AI-Generated Summary

PDF793January 2, 2025