Intent Detection is one of the tasks of the Natural Language Understanding (NLU) unit in task-oriented dialogue systems. Out of Scope (OOS) and Out of Domain (OOD) inputs may run these systems into a problem. On the other side, a labeled dataset is needed to train a model for Intent Detection in task-oriented dialogue systems. The creation of a labeled dataset is time-consuming and needs human resources. The purpose of this article is to address mentioned problems. The task of identifying OOD/OOS inputs is named OOD/OOS Intent Detection. Also, discovering new intents and pseudo-labeling of OOD inputs is well known by Intent Discovery. In OOD intent detection part, we make use of a Variational Autoencoder to distinguish between known and unknown intents independent of input data distribution. After that, an unsupervised clustering method is used to discover different unknown intents underlying OOD/OOS inputs. We also apply a non-linear dimensionality reduction on OOD/OOS representations to make distances between representations more meaning full for clustering. Our results show that the proposed model for both OOD/OOS Intent Detection and Intent Discovery achieves great results and passes baselines in English and Persian languages.
翻译:意图检测是任务型对话系统中自然语言理解(NLU)模块的任务之一。域外(OOS)和领域外(OOD)输入可能给这些系统带来问题。另一方面,训练任务型对话系统中的意图检测模型需要标注数据集,而创建标注数据集耗时且需要人力资源。本文旨在解决上述问题。识别OOD/OOS输入的任务被称为OOD/OOS意图检测。同时,发现新意图并对OOD输入进行伪标注通常被称为意图发现。在OOD意图检测部分,我们利用变分自编码器(VAE)来区分已知意图和未知意图,且该方法独立于输入数据分布。随后,采用无监督聚类方法发现隐藏在OOD/OOS输入中的不同未知意图。我们还对OOD/OOS表示进行非线性降维,使表示间的距离对聚类更具意义。实验结果表明,我们提出的模型在OOD/OOS意图检测和意图发现两项任务上均取得了优异效果,并在英语和波斯语上超越了基线方法。