State of the art models in intent induction require annotated datasets. However, annotating dialogues is time-consuming, laborious and expensive. In this work, we propose a completely unsupervised framework for intent induction within a dialogue. In addition, we show how pre-processing the dialogue corpora can improve results. Finally, we show how to extract the dialogue flows of intentions by investigating the most common sequences. Although we test our work in the MultiWOZ dataset, the fact that this framework requires no prior knowledge make it applicable to any possible use case, making it very relevant to real world customer support applications across industry.
翻译:当前最先进的意图归纳模型依赖于标注数据集。然而,对话标注过程耗时、费力且成本高昂。本研究提出了一种完全无监督的对话意图归纳框架。此外,我们展示了如何通过预处理对话语料库来提升效果,并最终通过分析最常见序列揭示了意图的对话流提取方法。尽管我们在MultiWOZ数据集上进行了验证,但由于该框架无需任何先验知识,因此可适用于任何实际场景,对工业界各类真实客服应用具有重要价值。