State-of-the-art models can perform well in controlled environments, but they often struggle when presented with out-of-distribution (OOD) examples, making OOD detection a critical component of NLP systems. In this paper, we focus on highlighting the limitations of existing approaches to OOD detection in NLP. Specifically, we evaluated eight OOD detection methods that are easily integrable into existing NLP systems and require no additional OOD data or model modifications. One of our contributions is providing a well-structured research environment that allows for full reproducibility of the results. Additionally, our analysis shows that existing OOD detection methods for NLP tasks are not yet sufficiently sensitive to capture all samples characterized by various types of distributional shifts. Particularly challenging testing scenarios arise in cases of background shift and randomly shuffled word order within in domain texts. This highlights the need for future work to develop more effective OOD detection approaches for the NLP problems, and our work provides a well-defined foundation for further research in this area.
翻译:当前最先进的模型在受控环境中表现良好,但面对分布外样本时往往难以应对,这使得分布外检测成为自然语言处理系统的关键组成部分。本文聚焦于揭示现有自然语言处理中分布外检测方法的局限性。具体而言,我们评估了八种易于整合至现有自然语言处理系统、且无需额外分布外数据或模型修改的检测方法。我们的贡献之一在于提供了结构规范的研究环境,确保实验结果具备完全可复现性。此外,分析表明现有面向自然语言处理任务的分布外检测方法尚未具备足够的敏感性,难以捕获所有由不同类型的分布偏移所表征的样本。特别是背景偏移与域内文本词序随机打乱这两种测试场景带来了严峻挑战。这凸显了未来工作中亟需针对自然语言处理问题开发更有效的分布外检测方法,而本研究为此领域的后续探索奠定了清晰的研究基础。