The focus of this work is to investigate unsupervised approaches to overcome quintessential challenges in designing task-oriented dialog schema: assigning intent labels to each dialog turn (intent clustering) and generating a set of intents based on the intent clustering methods (intent induction). We postulate there are two salient factors for automatic induction of intents: (1) clustering algorithm for intent labeling and (2) user utterance embedding space. We compare existing off-the-shelf clustering models and embeddings based on DSTC11 evaluation. Our extensive experiments demonstrate that the combined selection of utterance embedding and clustering method in the intent induction task should be carefully considered. We also present that pretrained MiniLM with Agglomerative clustering shows significant improvement in NMI, ARI, F1, accuracy and example coverage in intent induction tasks. The source codes are available at https://github.com/Jeiyoon/dstc11-track2.
翻译:本文旨在探索无监督方法以克服任务型对话模式设计中的核心挑战:为每个对话轮次分配意图标签(意图聚类),并基于意图聚类方法生成意图集合(意图归纳)。我们提出自动意图归纳的两个关键因素:(1)用于意图标注的聚类算法;(2)用户话语嵌入空间。基于DSTC11评估,我们对比了现有现成聚类模型与嵌入方法。大量实验表明,在意图归纳任务中,需审慎考虑话语嵌入与聚类方法的联合选择。此外,我们展示了预训练的MiniLM结合聚合聚类在意图归纳任务中的NMI、ARI、F1、准确率及示例覆盖率方面的显著提升。源代码已开源至https://github.com/Jeiyoon/dstc11-track2。