Dialog2Flow:预训练软对比驱动句子嵌入,用于自动对话流提取
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction
October 24, 2024
作者: Sergio Burdisso, Srikanth Madikeri, Petr Motlicek
cs.AI
摘要
在计算语言学中,从未经注释的对话中高效地推导结构化工作流程仍然是一个未被充分探讨且艰巨的挑战。自动化这一过程可以显著加快在新领域中手动设计工作流程的速度,并实现大型语言模型在特定领域流程图中的基础,增强透明度和可控性。本文介绍了Dialog2Flow(D2F)嵌入,它与传统的句子嵌入不同,通过将话语映射到一个潜在空间,根据其交际和信息功能(即它们代表的动作)对其进行分组。D2F允许将对话建模为潜在空间中的连续轨迹,其中包含不同的与动作相关的区域。通过对D2F嵌入进行聚类,潜在空间被量化,对话可以转换为区域/动作ID序列,有助于提取潜在的工作流程。为了预训练D2F,我们通过统一二十个面向任务的对话数据集,并标准化每轮动作注释,构建了一个全面的数据集。我们还引入了一种新颖的软对比损失,利用这些动作的语义信息来引导表示学习过程,表现出比标准监督对比损失更优越的性能。与包括特定于对话的句子嵌入在内的各种句子嵌入进行评估,结果显示D2F在各种领域中产生了优越的定性和定量结果。
English
Efficiently deriving structured workflows from unannotated dialogs remains an
underexplored and formidable challenge in computational linguistics. Automating
this process could significantly accelerate the manual design of workflows in
new domains and enable the grounding of large language models in
domain-specific flowcharts, enhancing transparency and controllability. In this
paper, we introduce Dialog2Flow (D2F) embeddings, which differ from
conventional sentence embeddings by mapping utterances to a latent space where
they are grouped according to their communicative and informative functions
(i.e., the actions they represent). D2F allows for modeling dialogs as
continuous trajectories in a latent space with distinct action-related regions.
By clustering D2F embeddings, the latent space is quantized, and dialogs can be
converted into sequences of region/action IDs, facilitating the extraction of
the underlying workflow. To pre-train D2F, we build a comprehensive dataset by
unifying twenty task-oriented dialog datasets with normalized per-turn action
annotations. We also introduce a novel soft contrastive loss that leverages the
semantic information of these actions to guide the representation learning
process, showing superior performance compared to standard supervised
contrastive loss. Evaluation against various sentence embeddings, including
dialog-specific ones, demonstrates that D2F yields superior qualitative and
quantitative results across diverse domains.Summary
AI-Generated Summary