ScoreFlow: 점수 기반 선호도 최적화를 통해 LLM 에이전트 워크플로우를 숙달하기

초록

최근 연구에서는 복잡한 문제 해결을 위해 대형 언어 모델 다중 에이전트 시스템을 활용하면서 구축에 필요한 수동 노력을 줄이려고 노력하고, 자동화된 에이전트 워크플로 최적화 방법의 개발을 촉진하고 있습니다. 그러나 기존 방법은 표현 제한, 적응성 부족, 이산 최적화 기술에 의존할 때 확장성이 떨어지는 등의 이유로 융통성이 떨어집니다. 저희는 ScoreFlow를 통해 이러한 도전에 대처합니다. ScoreFlow는 연속 공간에서 효율적인 그래디언트 기반 최적화를 활용하는 간단하면서도 고성능의 프레임워크입니다. ScoreFlow는 양적 피드백을 고려하는 직접 선호도 최적화 방법의 새로운 변형인 Score-DPO를 통합합니다. 질문 응답, 코딩, 수학적 추론을 포괄하는 여섯 가지 벤치마크에서 ScoreFlow는 기존 기준선 대비 8.2% 향상을 달성합니다. 더불어 더 낮은 추론 비용으로 더 큰 모델보다 작은 모델이 더 우수한 성과를 거두도록 돕습니다. 프로젝트: https://github.com/Gen-Verse/ScoreFlow

English

Recent research has leveraged large language model multi-agent systems for complex problem-solving while trying to reduce the manual effort required to build them, driving the development of automated agent workflow optimization methods. However, existing methods remain inflexible due to representational limitations, a lack of adaptability, and poor scalability when relying on discrete optimization techniques. We address these challenges with ScoreFlow, a simple yet high-performance framework that leverages efficient gradient-based optimization in a continuous space. ScoreFlow incorporates Score-DPO, a novel variant of the direct preference optimization method that accounts for quantitative feedback. Across six benchmarks spanning question answering, coding, and mathematical reasoning, ScoreFlow achieves an 8.2% improvement over existing baselines. Moreover, it empowers smaller models to outperform larger ones with lower inference costs. Project: https://github.com/Gen-Verse/ScoreFlow

ScoreFlow: 점수 기반 선호도 최적화를 통해 LLM 에이전트 워크플로우를 숙달하기

ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization

초록

Support