The focus of this work is to investigate unsupervised approaches to overcome quintessential challenges in designing task-oriented dialog schema: assigning intent labels to each dialog turn (intent clustering) and generating a set of intents based on the intent clustering methods (intent induction). We postulate there are two salient factors for automatic induction of intents: (1) clustering algorithm for intent labeling and (2) user utterance embedding space. We compare existing off-the-shelf clustering models and embeddings based on DSTC11 evaluation. Our extensive experiments demonstrate that the combined selection of utterance embedding and clustering method in the intent induction task should be carefully considered. We also present that pretrained MiniLM with Agglomerative clustering shows significant improvement in NMI, ARI, F1, accuracy and example coverage in intent induction tasks. The source codes are available at https://github.com/Jeiyoon/dstc11-track2.
翻译:本研究旨在探索无监督方法以应对设计任务型对话模式时的关键挑战:为每轮对话分配意图标签(意图聚类)以及基于意图聚类方法生成一组意图(意图归纳)。我们提出,自动意图归纳的两个显著因素为:(1) 用于意图标签的聚类算法与(2) 用户话语嵌入空间。我们基于DSTC11评估,对比了现有现成聚类模型与嵌入方法。大量实验表明,在意图归纳任务中,应审慎考虑话语嵌入与聚类方法的联合选择。同时我们也展示,预训练MiniLM结合层次聚类在意图归纳任务的NMI、ARI、F1、准确率及样例覆盖率上均取得显著提升。源代码见https://github.com/Jeiyoon/dstc11-track2。