This paper introduces a novel method leveraging bi-encoder-based detectors along with a comprehensive study comparing different out-of-distribution (OOD) detection methods in NLP using different feature extractors. The feature extraction stage employs popular methods such as Universal Sentence Encoder (USE), BERT, MPNET, and GLOVE to extract informative representations from textual data. The evaluation is conducted on several datasets, including CLINC150, ROSTD-Coarse, SNIPS, and YELLOW. Performance is assessed using metrics such as F1-Score, MCC, FPR@90, FPR@95, AUPR, an AUROC. The experimental results demonstrate that the proposed bi-encoder-based detectors outperform other methods, both those that require OOD labels in training and those that do not, across all datasets, showing great potential for OOD detection in NLP. The simplicity of the training process and the superior detection performance make them applicable to real-world scenarios. The presented methods and benchmarking metrics serve as a valuable resource for future research in OOD detection, enabling further advancements in this field. The code and implementation details can be found on our GitHub repository: https://github.com/yellowmessenger/ood-detection.
翻译:摘要:本文提出了一种新颖方法,采用基于双编码器的检测器,并系统比较了不同特征提取器下自然语言处理(NLP)中的分布外(OOD)检测方法。特征提取阶段使用通用句子编码器(USE)、BERT、MPNET和GLOVE等主流方法,从文本数据中提取具有信息量的表征。评估在多个数据集上进行,包括CLINC150、ROSTD-Coarse、SNIPS和YELLOW。使用F1分数、MCC、FPR@90、FPR@95、AUPR和AUROC等指标评估性能。实验结果表明,本文提出的基于双编码器的检测器在所有数据集上均优于其他方法(包括训练时需要OOD标签的方法和无需标签的方法),展现出在NLP中进行OOD检测的巨大潜力。其训练过程简洁且检测性能优越,使其适用于实际场景。所提出的方法和基准测试指标为未来OOD检测研究提供了宝贵资源,有望推动该领域的进一步发展。代码和实现细节可在我们的GitHub仓库中获取:https://github.com/yellowmessenger/ood-detection。