Detecting out-of-domain (OOD) intents from user queries is essential for a task-oriented dialogue system. Previous OOD detection studies generally work on the assumption that plenty of labeled IND intents exist. In this paper, we focus on a more practical few-shot OOD setting where there are only a few labeled IND data and massive unlabeled mixed data that may belong to IND or OOD. The new scenario carries two key challenges: learning discriminative representations using limited IND data and leveraging unlabeled mixed data. Therefore, we propose an adaptive prototypical pseudo-labeling (APP) method for few-shot OOD detection, including a prototypical OOD detection framework (ProtoOOD) to facilitate low-resource OOD detection using limited IND data, and an adaptive pseudo-labeling method to produce high-quality pseudo OOD\&IND labels. Extensive experiments and analysis demonstrate the effectiveness of our method for few-shot OOD detection.
翻译:从用户查询中检测域外(OOD)意图对于任务型对话系统至关重要。以往的OOD检测研究通常假设存在大量标注的域内(IND)意图。本文聚焦于更贴近实际的小样本OOD场景,其中仅有少量标注的IND数据,以及大量可能属于IND或OOD的未标注混合数据。该新场景面临两个关键挑战:利用有限的IND数据学习判别性表征,以及有效利用未标注的混合数据。为此,我们提出了一种面向小样本OOD检测的自适应原型伪标注方法(APP),包括一个基于原型的OOD检测框架(ProtoOOD),用于利用有限的IND数据促进低资源OOD检测,以及一个自适应伪标注方法,用于生成高质量的伪OOD和IND标签。大量实验与分析证明了我们方法在小样本OOD检测中的有效性。