Identifying intents from dialogue utterances forms an integral component of task-oriented dialogue systems. Intent-related tasks are typically formulated either as a classification task, where the utterances are classified into predefined categories or as a clustering task when new and previously unknown intent categories need to be discovered from these utterances. Further, the intent classification may be modeled in a multiclass (MC) or multilabel (ML) setup. While typically these tasks are modeled as separate tasks, we propose IntenDD, a unified approach leveraging a shared utterance encoding backbone. IntenDD uses an entirely unsupervised contrastive learning strategy for representation learning, where pseudo-labels for the unlabeled utterances are generated based on their lexical features. Additionally, we introduce a two-step post-processing setup for the classification tasks using modified adsorption. Here, first, the residuals in the training data are propagated followed by smoothing the labels both modeled in a transductive setting. Through extensive evaluations on various benchmark datasets, we find that our approach consistently outperforms competitive baselines across all three tasks. On average, IntenDD reports percentage improvements of 2.32%, 1.26%, and 1.52% in their respective metrics for few-shot MC, few-shot ML, and the intent discovery tasks respectively.
翻译:从对话语句中识别意图构成任务型对话系统的核心组成部分。意图相关任务通常被建模为分类任务(将语句划分至预定义类别)或聚类任务(需从语句中发现新出现的未知意图类别)。此外,意图分类可在多类(MC)或多标签(ML)设置下建模。针对这些通常被独立建模的任务,我们提出IntenDD——一种利用共享语句编码主干网络的统一方法。IntenDD采用完全无监督的对比学习策略进行表示学习,通过基于词汇特征为未标注语句生成伪标签。同时,我们引入基于修正吸附的两步后处理框架用于分类任务:首先传播训练数据中的残差,随后在转导设置下进行标签平滑处理。通过在多个基准数据集上的广泛评估,我们发现该方法在所有三类任务中均一致优于竞争基线。平均而言,IntenDD在小样本多类、小样本多标签和意图发现任务中分别实现了2.32%、1.26%和1.52%的指标提升。