ChatPaper.aiChatPaper

REALTALK:面向长期对话的21天真实世界数据集

REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation

February 18, 2025
作者: Dong-Ho Lee, Adyasha Maharana, Jay Pujara, Xiang Ren, Francesco Barbieri
cs.AI

摘要

长期、开放领域的对话能力对于旨在回忆过往互动并展现情感智能(EI)的聊天机器人至关重要。然而,现有研究大多依赖合成、由大型语言模型(LLM)生成的数据,这导致对真实世界对话模式的探讨仍存疑问。为填补这一空白,我们引入了REALTALK,一个为期21天的真实即时通讯应用对话语料库,为直接评估与真实人类互动的表现提供了基准。 我们首先进行了数据集分析,聚焦于情感智能属性和角色一致性,以理解现实对话带来的独特挑战。通过与LLM生成的对话对比,我们揭示了关键差异,包括多样化的情感表达和角色稳定性的变化,这些往往是合成对话难以捕捉的。 基于这些洞察,我们提出了两项基准任务:(1)角色模拟,即模型在给定先前对话上下文的情况下,代表特定用户继续对话;(2)记忆探测,即模型回答需要长期记忆过去互动的针对性问题。 我们的研究发现,仅凭对话历史,模型难以准确模拟用户,而在特定用户聊天记录上进行微调则能提升角色模仿效果。此外,现有模型在回忆和利用现实对话中的长期上下文方面面临显著挑战。
English
Long-term, open-domain dialogue capabilities are essential for chatbots aiming to recall past interactions and demonstrate emotional intelligence (EI). Yet, most existing research relies on synthetic, LLM-generated data, leaving open questions about real-world conversational patterns. To address this gap, we introduce REALTALK, a 21-day corpus of authentic messaging app dialogues, providing a direct benchmark against genuine human interactions. We first conduct a dataset analysis, focusing on EI attributes and persona consistency to understand the unique challenges posed by real-world dialogues. By comparing with LLM-generated conversations, we highlight key differences, including diverse emotional expressions and variations in persona stability that synthetic dialogues often fail to capture. Building on these insights, we introduce two benchmark tasks: (1) persona simulation where a model continues a conversation on behalf of a specific user given prior dialogue context; and (2) memory probing where a model answers targeted questions requiring long-term memory of past interactions. Our findings reveal that models struggle to simulate a user solely from dialogue history, while fine-tuning on specific user chats improves persona emulation. Additionally, existing models face significant challenges in recalling and leveraging long-term context within real-world conversations.

Summary

AI-Generated Summary

PDF62February 20, 2025