New Intent Discovery (NID) aims to recognize both new and known intents from unlabeled data with the aid of limited labeled data containing only known intents. Without considering structure relationships between samples, previous methods generate noisy supervisory signals which cannot strike a balance between quantity and quality, hindering the formation of new intent clusters and effective transfer of the pre-training knowledge. To mitigate this limitation, we propose a novel Diffusion Weighted Graph Framework (DWGF) to capture both semantic similarities and structure relationships inherent in data, enabling more sufficient and reliable supervisory signals. Specifically, for each sample, we diffuse neighborhood relationships along semantic paths guided by the nearest neighbors for multiple hops to characterize its local structure discriminately. Then, we sample its positive keys and weigh them based on semantic similarities and local structures for contrastive learning. During inference, we further propose Graph Smoothing Filter (GSF) to explicitly utilize the structure relationships to filter high-frequency noise embodied in semantically ambiguous samples on the cluster boundary. Extensive experiments show that our method outperforms state-of-the-art models on all evaluation metrics across multiple benchmark datasets. Code and data are available at https://github.com/yibai-shi/DWGF.
翻译:新意图发现旨在借助仅包含已知意图的有限标注数据,从无标注数据中识别出新意图与已知意图。现有方法未考虑样本间的结构关系,生成的监督信号存在噪声,难以在数量与质量间取得平衡,从而阻碍新意图簇的形成以及预训练知识的有效迁移。为缓解这一局限,我们提出一种新型扩散加权图框架,以同时捕捉数据中蕴含的语义相似性与结构关系,从而生成更充分且可靠的监督信号。具体而言,针对每个样本,我们沿由最近邻引导的语义路径扩散多跳邻域关系,以区分性地刻画其局部结构;随后,基于语义相似性与局部结构对其正键进行采样与加权,用于对比学习。在推断阶段,我们进一步提出图平滑滤波器,显式利用结构关系过滤簇边界上语义模糊样本中的高频噪声。大量实验表明,本方法在多个基准数据集的全部评估指标上均优于现有最优模型。代码与数据见https://github.com/yibai-shi/DWGF。