Identifying unexpected domain-shifted instances in natural language processing is crucial in real-world applications. Previous works identify the OOD instance by leveraging a single global feature embedding to represent the sentence, which cannot characterize subtle OOD patterns well. Another major challenge current OOD methods face is learning effective low-dimensional sentence representations to identify the hard OOD instances that are semantically similar to the ID data. In this paper, we propose a new unsupervised OOD detection method, namely Semantic Role Labeling Guided Out-of-distribution Detection (SRLOOD), that separates, extracts, and learns the semantic role labeling (SRL) guided fine-grained local feature representations from different arguments of a sentence and the global feature representations of the full sentence using a margin-based contrastive loss. A novel self-supervised approach is also introduced to enhance such global-local feature learning by predicting the SRL extracted role. The resulting model achieves SOTA performance on four OOD benchmarks, indicating the effectiveness of our approach. Codes will be available upon acceptance.
翻译:在自然语言处理中,识别意外的领域偏移样本对实际应用至关重要。以往的工作通过利用单一的全局特征嵌入来表征句子,从而识别分布外(OOD)样本,但这无法很好地刻画细微的OOD模式。当前OOD方法面临的另一主要挑战是学习有效的低维句子表征,以识别那些语义上与域内数据相似的高难度OOD样本。本文提出一种新的无监督OOD检测方法,即语义角色标注引导的分布外检测(SRLOOD),该方法将句子中不同论元的语义角色标注(SRL)引导的细粒度局部特征表示与完整句子的全局特征表示进行分离、提取和学习,并采用基于间隔的对比损失。我们还引入了一种新颖的自监督方法,通过预测SRL提取的角色来增强这种全局-局部特征学习。最终模型在四个OOD基准测试中达到了最先进的性能,表明了我们方法的有效性。代码将在录用后公开。