We study the problem of efficiently detecting Out-of-Distribution (OOD) samples at test time in supervised and unsupervised learning contexts. While ML models are typically trained under the assumption that training and test data stem from the same distribution, this is often not the case in realistic settings, thus reliably detecting distribution shifts is crucial at deployment. We re-formulate the OOD problem under the lenses of statistical testing and then discuss conditions that render the OOD problem identifiable in statistical terms. Building on this framework, we study convergence guarantees of an OOD test based on the Wasserstein distance, and provide a simple empirical evaluation.
翻译:我们研究在监督和无监督学习场景中高效检测测试时分布外样本的问题。尽管机器学习模型通常在训练数据与测试数据同分布的假设下进行训练,但在实际场景中往往并非如此,因此可靠地检测分布偏移对于模型部署至关重要。我们以统计检验的视角重新表述分布外问题,并讨论使该问题在统计意义上可识别的条件。基于这一框架,我们研究了基于Wasserstein距离的分布外检验的收敛性保证,并提供了简单的实证评估。