We expect the generalization error to improve with more samples from a similar task, and to deteriorate with more samples from an out-of-distribution (OOD) task. In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples. As the number of OOD samples increases, the generalization error on the target task improves before deteriorating beyond a threshold. In other words, there is value in training on small amounts of OOD data. We use Fisher's Linear Discriminant on synthetic datasets and deep networks on computer vision benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS and DomainNet to demonstrate and analyze this phenomenon. In the idealistic setting where we know which samples are OOD, we show that these non-monotonic trends can be exploited using an appropriately weighted objective of the target and OOD empirical risk. While its practical utility is limited, this does suggest that if we can detect OOD samples, then there may be ways to benefit from them. When we do not know which samples are OOD, we show how a number of go-to strategies such as data-augmentation, hyper-parameter optimization, and pre-training are not enough to ensure that the target generalization error does not deteriorate with the number of OOD samples in the dataset.
翻译:我们通常预期,来自相似任务的样本越多,泛化误差越小;而来自分布外(OOD)任务的样本越多,泛化误差越大。然而,本文展示了一个反直觉的现象:目标任务的泛化误差关于OOD样本数量呈现非单调函数关系。随着OOD样本数量增加,目标任务上的泛化误差先改善后恶化,存在一个性能转折阈值。换言之,少量OOD数据的训练具有实际价值。我们采用合成数据集上的Fisher线性判别分析以及计算机视觉基准(如MNIST、CIFAR-10、CINIC-10、PACS和DomainNet)上的深度网络,对该现象进行验证与分析。在理想化设定下(已知样本是否为OOD),我们证明可通过加权优化目标与OOD经验风险来利用这种非单调趋势。尽管其实际应用价值有限,但该结果表明:若能检测OOD样本,则可能存在从中获益的方法。当无法区分OOD样本时,我们展示数据增强、超参数优化和预训练等常规策略不足以阻止目标泛化误差随数据集中OOD样本数量增加而恶化。