Efficiently deriving structured workflows from unannotated dialogs remains an underexplored and formidable challenge in computational linguistics. Automating this process could significantly accelerate the manual design of workflows in new domains and enable the grounding of large language models in domain-specific flowcharts, enhancing transparency and controllability. In this paper, we introduce Dialog2Flow (D2F) embeddings, which differ from conventional sentence embeddings by mapping utterances to a latent space where they are grouped according to their communicative and informative functions (i.e., the actions they represent). D2F allows for modeling dialogs as continuous trajectories in a latent space with distinct action-related regions. By clustering D2F embeddings, the latent space is quantized, and dialogs can be converted into sequences of region/action IDs, facilitating the extraction of the underlying workflow. To pre-train D2F, we build a comprehensive dataset by unifying twenty task-oriented dialog datasets with normalized per-turn action annotations. We also introduce a novel soft contrastive loss that leverages the semantic information of these actions to guide the representation learning process, showing superior performance compared to standard supervised contrastive loss. Evaluation against various sentence embeddings, including dialog-specific ones, demonstrates that D2F yields superior qualitative and quantitative results across diverse domains.
翻译:从未标注对话中高效推导结构化工作流,仍然是计算语言学中一个尚未充分探索且极具挑战性的难题。自动化此过程可显著加速新领域中工作流的手动设计,并使大语言模型能够基于特定领域的流程图进行落地,从而增强透明度和可控性。本文介绍了Dialog2Flow(D2F)嵌入,其不同于传统句子嵌入之处在于,它将话语映射到一个潜在空间,在该空间中话语根据其交际和信息功能(即它们所代表的动作)进行分组。D2F允许将对话建模为具有不同动作相关区域的潜在空间中的连续轨迹。通过对D2F嵌入进行聚类,潜在空间被量化,对话可以转换为区域/动作ID序列,从而便于提取底层工作流。为了预训练D2F,我们通过整合二十个任务导向对话数据集并规范化每轮次动作标注,构建了一个综合性数据集。我们还引入了一种新颖的软对比损失函数,该函数利用这些动作的语义信息来指导表示学习过程,与标准监督对比损失相比,展现出更优的性能。与包括对话专用嵌入在内的多种句子嵌入方法进行的评估对比表明,D2F在多个不同领域均能产生更优的定性和定量结果。