ChatPaper.aiChatPaper

ScoreFlow:通过基于得分的偏好优化掌握LLM代理工作流

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

February 6, 2025
作者: Yinjie Wang, Ling Yang, Guohao Li, Mengdi Wang, Bryon Aragam
cs.AI

摘要

最近的研究利用大型语言模型多智能体系统进行复杂问题解决,同时试图减少构建这些系统所需的手动工作量,推动了自动化智能体工作流优化方法的发展。然而,由于表征限制、缺乏适应性以及依赖离散优化技术时的可扩展性差,现有方法仍然缺乏灵活性。我们通过ScoreFlow来解决这些挑战,这是一个简单但高性能的框架,利用连续空间中高效的基于梯度的优化。ScoreFlow集成了Score-DPO,这是直接偏好优化方法的一种新变体,考虑了定量反馈。在涵盖问答、编码和数学推理的六个基准测试中,ScoreFlow相对现有基准线提高了8.2%。此外,它使较小的模型能够以更低的推理成本胜过较大的模型。项目链接:https://github.com/Gen-Verse/ScoreFlow
English
Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow

Summary

AI-Generated Summary

PDF192February 7, 2025