ChatPaper.aiChatPaper

CoSER:协调基于LLM的已建立角色的人设模拟

CoSER: Coordinating LLM-Based Persona Simulation of Established Roles

February 13, 2025
作者: Xintao Wang, Heng Wang, Yifei Zhang, Xinfeng Yuan, Rui Xu, Jen-tse Huang, Siyu Yuan, Haoran Guo, Jiangjie Chen, Wei Wang, Yanghua Xiao, Shuchang Zhou
cs.AI

摘要

角色扮演语言代理(RPLAs)已成为大型语言模型(LLMs)的应用中备受期待的应用。然而,由于缺乏真实角色数据集和使用这类数据的微妙评估方法,模拟已建立角色对RPLAs来说是一项具有挑战性的任务。本文介绍了CoSER,这是一个高质量数据集、开放模型和评估协议的集合,旨在实现对已建立角色进行有效模拟的RPLAs。CoSER数据集涵盖了来自771本知名书籍的17,966个角色。它提供了具有真实世界复杂性的对话,以及各种数据类型,如对话设置、角色经历和内心想法。借鉴表演方法论,我们引入了给定环境表演,用于训练和评估角色扮演LLMs,在这种方法中,LLMs按顺序扮演书中多个角色。利用我们的数据集,我们开发了CoSER 8B和CoSER 70B,即基于LLaMA-3.1模型构建的先进开放式角色扮演LLMs。大量实验证明了CoSER数据集在RPLA训练、评估和检索方面的价值。此外,CoSER 70B在我们的评估和三个现有基准测试中表现出了最先进的性能,超过或与GPT-4o相匹配,分别在InCharacter和LifeChoice基准测试中实现了75.80%和93.47%的准确率。
English
Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets and nuanced evaluation methods using such data. In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. The CoSER dataset covers 17,966 characters from 771 renowned books. It provides authentic dialogues with real-world intricacies, as well as diverse data types such as conversation setups, character experiences and internal thoughts. Drawing from acting methodology, we introduce given-circumstance acting for training and evaluating role-playing LLMs, where LLMs sequentially portray multiple characters in book scenes. Using our dataset, we develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models. Extensive experiments demonstrate the value of the CoSER dataset for RPLA training, evaluation and retrieval. Moreover, CoSER 70B exhibits state-of-the-art performance surpassing or matching GPT-4o on our evaluation and three existing benchmarks, i.e., achieving 75.80% and 93.47% accuracy on the InCharacter and LifeChoice benchmarks respectively.

Summary

AI-Generated Summary

PDF282February 14, 2025