ChatPaper.aiChatPaper

OpenWebVoyager:通過迭代式真實世界探索、反饋和優化建立多模態Web代理程序

OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization

October 25, 2024
作者: Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Hongming Zhang, Tianqing Fang, Zhenzhong Lan, Dong Yu
cs.AI

摘要

大型語言和多模型模型的快速發展已經引起了對使用專有模型(如GPT-4o)來開發能夠處理像網頁導航這樣的現實場景的自主代理的巨大興趣。儘管最近的開源努力已經試圖讓代理具備探索環境並隨著時間不斷改進的能力,但它們正在建立僅在合成環境中具有明確定義獎勵信號的僅文本代理。這樣的代理很難推廣到需要多模態感知能力並且缺乏基本真實信號的現實設置。在本文中,我們介紹了一個旨在促進開發能夠自主進行現實世界探索並改進自身的多模態網頁代理的開源框架。我們首先通過模仿學習來訓練基本模型以獲得基本能力。然後讓代理探索開放網頁並收集其軌跡的反饋。之後,它通過從另一個通用模型判斷的表現良好的軌跡中學習進一步改進其策略。這種探索-反饋-優化循環可以持續進行多個迭代。實驗結果表明,我們的網頁代理在每次迭代後成功改進自身,展現出在多個測試集上的強大性能。
English
The rapid development of large language and multimodal models has sparked significant interest in using proprietary models, such as GPT-4o, to develop autonomous agents capable of handling real-world scenarios like web navigation. Although recent open-source efforts have tried to equip agents with the ability to explore environments and continuously improve over time, they are building text-only agents in synthetic environments where the reward signals are clearly defined. Such agents struggle to generalize to realistic settings that require multimodal perception abilities and lack ground-truth signals. In this paper, we introduce an open-source framework designed to facilitate the development of multimodal web agent that can autonomously conduct real-world exploration and improve itself. We first train the base model with imitation learning to gain the basic abilities. We then let the agent explore the open web and collect feedback on its trajectories. After that, it further improves its policy by learning from well-performing trajectories judged by another general-purpose model. This exploration-feedback-optimization cycle can continue for several iterations. Experimental results show that our web agent successfully improves itself after each iteration, demonstrating strong performance across multiple test sets.

Summary

AI-Generated Summary

PDF172November 16, 2024