The Value of Out-of-Distribution Data

from arxiv, Previous versions of this work have been presented at the Out-of-Distribution Generalization in Computer Vision (OOD-CV) Workshop (ECCV 2022) and the Workshop on Distribution Shifts (NeurIPS 2022)

We expect the generalization error to improve with more samples from a similar task, and to deteriorate with more samples from an out-of-distribution (OOD) task. In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples. As the number of OOD samples increases, the generalization error on the target task improves before deteriorating beyond a threshold. In other words, there is value in training on small amounts of OOD data. We use Fisher's Linear Discriminant on synthetic datasets and deep networks on computer vision benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS and DomainNet to demonstrate and analyze this phenomenon. In the idealistic setting where we know which samples are OOD, we show that these non-monotonic trends can be exploited using an appropriately weighted objective of the target and OOD empirical risk. While its practical utility is limited, this does suggest that if we can detect OOD samples, then there may be ways to benefit from them. When we do not know which samples are OOD, we show how a number of go-to strategies such as data-augmentation, hyper-parameter optimization, and pre-training are not enough to ensure that the target generalization error does not deteriorate with the number of OOD samples in the dataset.

翻译：我们期望从相似任务中获取更多样本能够降低泛化误差，而来自分布外任务的样本增多则会加剧泛化误差。然而，本研究揭示了一个反直觉现象：目标任务的泛化误差关于分布外样本数量呈现非单调特性。随着分布外样本增加，目标任务的泛化误差先改善后恶化，存在一个性能拐点阈值。换言之，少量分布外训练数据具有显著价值。我们通过合成数据集上的Fisher线性判别分析以及计算机视觉基准（如MNIST、CIFAR-10、CINIC-10、PACS和DomainNet）上的深度网络实验，验证并剖析了这一现象。在已知样本属于分布外的理想设定下，我们证明可通过目标域与分布外经验风险的加权目标函数来利用这种非单调趋势。尽管其实际效用有限，但研究表明：若能检测分布外样本，则存在潜在途径从中获益。当无法识别分布外样本时，我们证实现有主流策略（如数据增强、超参数优化与预训练）并不足以确保目标泛化误差不随数据集中分布外样本数量增加而恶化。

相关内容

泛化误差

关注 107

学习方法的泛化能力（Generalization Error）是由该方法学习到的模型对未知数据的预测能力，是学习方法本质上重要的性质。现实中采用最多的办法是通过测试泛化误差来评价学习方法的泛化能力。泛化误差界刻画了学习算法的经验风险与期望风险之间偏差和收敛速度。一个机器学习的泛化误差（Generalization Error），是一个描述学生机器在从样品数据中学习之后，离教师机器之间的差距的函数。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日