We study the problem of efficiently detecting Out-of-Distribution (OOD) samples at test time in supervised and unsupervised learning contexts. While ML models are typically trained under the assumption that training and test data stem from the same distribution, this is often not the case in realistic settings, thus reliably detecting distribution shifts is crucial at deployment. We re-formulate the OOD problem under the lenses of statistical testing and then discuss conditions that render the OOD problem identifiable in statistical terms. Building on this framework, we study convergence guarantees of an OOD test based on the Wasserstein distance, and provide a simple empirical evaluation.
翻译:我们研究在监督与无监督学习场景中高效检测测试时分布外样本的问题。尽管机器学习模型通常在训练数据和测试数据同分布的假设下进行训练,但在实际应用中这一假设往往不成立,因此可靠地检测分布偏移对模型部署至关重要。我们从统计检验的视角重新阐述分布外问题,进而讨论使该问题在统计意义上可识别的条件。基于该框架,我们研究了基于Wasserstein距离的分布外检验的收敛性保证,并进行了简单的实证评估。