SDPO: 사회적 에이전트를 위한 세그먼트 수준 직접 선호도 최적화

초록

대형 언어 모델(Large Language Models, LLM)을 기반으로 한 사회적 에이전트들은 인간의 사회적 행동을 시뮬레이션할 수 있지만 복잡한 목표 지향적 사회 대화를 다루는 데 한계가 있습니다. 직접 선호도 최적화(Direct Preference Optimization, DPO)는 다양한 에이전트 작업에서 LLM의 행동을 인간의 선호도와 조화롭게 만드는 데 효과적임이 입증되었습니다. 다중 턴 상호작용을 위한 기존의 DPO 기반 접근 방식은 턴 수준 및 세션 수준 방법으로 나뉩니다. 턴 수준 방법은 개별 턴에만 집중하는 과도하게 미세한 반면, 세션 수준 방법은 종종 훈련 잡음을 도입하여 너무 거친 것으로 나타납니다. 이러한 한계를 극복하기 위해 우리는 상호작용 내에서 특정 주요 세그먼트에 초점을 맞추어 다중 턴 에이전트 행동을 최적화하고 훈련 잡음을 최소화하는 '세그먼트 수준 직접 선호도 최적화(Segment-Level Direct Preference Optimization, SDPO)'를 제안합니다. SOTOPIA 벤치마크에서의 평가 결과 SDPO로 조정된 에이전트들이 기존의 DPO 기반 방법 및 GPT-4o와 같은 소유 LLM보다 일관되게 우수한 성과를 보여주며, SDPO가 LLM 기반 에이전트의 사회적 지능을 발전시킬 잠재력을 강조합니다. 우리는 코드와 데이터를 https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/SDPO 에서 공개합니다.

English

Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex goal-oriented social dialogues. Direct Preference Optimization (DPO) has proven effective in aligning LLM behavior with human preferences across a variety of agent tasks. Existing DPO-based approaches for multi-turn interactions are divided into turn-level and session-level methods. The turn-level method is overly fine-grained, focusing exclusively on individual turns, while session-level methods are too coarse-grained, often introducing training noise. To address these limitations, we propose Segment-Level Direct Preference Optimization (SDPO), which focuses on specific key segments within interactions to optimize multi-turn agent behavior while minimizing training noise. Evaluations on the SOTOPIA benchmark demonstrate that SDPO-tuned agents consistently outperform both existing DPO-based methods and proprietary LLMs like GPT-4o, underscoring SDPO's potential to advance the social intelligence of LLM-based agents. We release our code and data at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/SDPO.

SDPO: 사회적 에이전트를 위한 세그먼트 수준 직접 선호도 최적화

SDPO: Segment-Level Direct Preference Optimization for Social Agents

초록

Summary

Support