This paper introduces a novel method leveraging bi-encoder-based detectors along with a comprehensive study comparing different out-of-distribution (OOD) detection methods in NLP using different feature extractors. The feature extraction stage employs popular methods such as Universal Sentence Encoder (USE), BERT, MPNET, and GLOVE to extract informative representations from textual data. The evaluation is conducted on several datasets, including CLINC150, ROSTD-Coarse, SNIPS, and YELLOW. Performance is assessed using metrics such as F1-Score, MCC, FPR@90, FPR@95, AUPR, an AUROC. The experimental results demonstrate that the proposed bi-encoder-based detectors outperform other methods, both those that require OOD labels in training and those that do not, across all datasets, showing great potential for OOD detection in NLP. The simplicity of the training process and the superior detection performance make them applicable to real-world scenarios. The presented methods and benchmarking metrics serve as a valuable resource for future research in OOD detection, enabling further advancements in this field. The code and implementation details can be found on our GitHub repository: https://github.com/yellowmessenger/ood-detection.
翻译:摘要:本文提出了一种新颖方法,利用基于双编码器的检测器,并结合一项综合性研究,比较了自然语言处理(NLP)中采用不同特征提取器的多种分布外(OOD)检测方法。特征提取阶段使用了通用句子编码器(USE)、BERT、MPNET和GLOVE等流行方法,以从文本数据中提取信息性表征。评估在多个数据集上进行,包括CLINC150、ROSTD-Coarse、SNIPS和YELLOW。性能通过F1分数、MCC、FPR@90、FPR@95、AUPR和AUROC等指标进行评估。实验结果表明,所提出的基于双编码器的检测器在所有数据集上均优于其他方法——无论是需要在训练中使用OOD标签的方法,还是不需要的方法——展示了其在NLP OOD检测中的巨大潜力。训练过程的简洁性以及卓越的检测性能使其适用于实际场景。所呈现的方法和基准测试指标为未来OOD检测研究提供了宝贵资源,推动该领域的进一步发展。代码和实现细节可在我们的GitHub仓库中找到:https://github.com/yellowmessenger/ood-detection。