Prior work typically describes out-of-domain (OOD) or out-of-distribution (OODist) samples as those that originate from dataset(s) or source(s) different from the training set but for the same task. When compared to in-domain (ID) samples, the models have been known to usually perform poorer on OOD samples, although this observation is not consistent. Another thread of research has focused on OOD detection, albeit mostly using supervised approaches. In this work, we first consolidate and present a systematic analysis of multiple definitions of OOD and OODist as discussed in prior literature. Then, we analyze the performance of a model under ID and OOD/OODist settings in a principled way. Finally, we seek to identify an unsupervised method for reliably identifying OOD/OODist samples without using a trained model. The results of our extensive evaluation using 12 datasets from 4 different tasks suggest the promising potential of unsupervised metrics in this task.
翻译:先前工作通常将领域外(OOD)或分布外(OODist)样本描述为:源自与训练集不同但任务相同的数据集或来源的样本。与领域内(ID)样本相比,已知模型在OOD样本上的表现通常较差(尽管这一观察结果并不一致)。另一研究方向聚焦于OOD检测,但主要采用监督方法。本研究首先整合并系统分析了先前文献中讨论的OOD与OODist多种定义;其次以原理性方式分析模型在ID与OOD/OODist场景下的性能;最后尝试探索无需训练模型即可可靠识别OOD/OODist样本的无监督方法。基于4类任务12个数据集的大规模评估结果表明,无监督指标在此任务中具有令人期待的潜力。