Out-of-distribution (OOD) detection is essential for the reliable and safe deployment of machine learning systems in the real world. Great progress has been made over the past years. This paper presents the first review of recent advances in OOD detection with a particular focus on natural language processing approaches. First, we provide a formal definition of OOD detection and discuss several related fields. We then categorize recent algorithms into three classes according to the data they used: (1) OOD data available, (2) OOD data unavailable + in-distribution (ID) label available, and (3) OOD data unavailable + ID label unavailable. Third, we introduce datasets, applications, and metrics. Finally, we summarize existing work and present potential future research topics.
翻译:分布外检测对于机器学习系统在现实世界中可靠且安全地部署至关重要。过去数年间,该领域取得了重大进展。本文首次以自然语言处理方法为重点,综述了分布外检测的最新研究进展。首先,我们对分布外检测进行了正式定义并讨论了若干相关领域。随后,根据所用数据类型将现有算法分为三类:(1)可用分布外数据,(2)不可用分布外数据但可用分布内标签,(3)既不可用分布外数据也不可用分布内标签。接着,我们介绍了数据集、应用场景及评估指标。最后,对现有工作进行总结并展望了未来潜在的研究方向。