We present our work on Track 2 in the Dialog System Technology Challenges 11 (DSTC11). DSTC11-Track2 aims to provide a benchmark for zero-shot, cross-domain, intent-set induction. In the absence of in-domain training dataset, robust utterance representation that can be used across domains is necessary to induce users' intentions. To achieve this, we leveraged a multi-domain dialogue dataset to fine-tune the language model and proposed extracting Verb-Object pairs to remove the artifacts of unnecessary information. Furthermore, we devised the method that generates each cluster's name for the explainability of clustered results. Our approach achieved 3rd place in the precision score and showed superior accuracy and normalized mutual information (NMI) score than the baseline model on various domain datasets.
翻译:摘要:本文介绍了我们在对话系统技术挑战赛第11届(DSTC11)赛道2中的工作。DSTC11赛道2旨在为零样本、跨领域意图集合归纳提供基准测试。在缺乏领域内训练数据集的情况下,需要能够跨领域使用的鲁棒性话语表征来归纳用户意图。为此,我们利用多领域对话数据集对语言模型进行微调,并提出提取动宾对来消除冗余信息的干扰。此外,我们设计了一种为每个聚类生成名称的方法,以增强聚类结果的可解释性。我们的方法在精确率评分中获得第三名,并在多个领域数据集上展现出优于基线模型的准确率和归一化互信息(NMI)评分。