We expect the generalization error to improve with more samples from a similar task, and to deteriorate with more samples from an out-of-distribution (OOD) task. In this work, we show a counter-intuitive phenomenon: the generalization error of a task can be a non-monotonic function of the number of OOD samples. As the number of OOD samples increases, the generalization error on the target task improves before deteriorating beyond a threshold. In other words, there is value in training on small amounts of OOD data. We use Fisher's Linear Discriminant on synthetic datasets and deep networks on computer vision benchmarks such as MNIST, CIFAR-10, CINIC-10, PACS and DomainNet to demonstrate and analyze this phenomenon. In the idealistic setting where we know which samples are OOD, we show that these non-monotonic trends can be exploited using an appropriately weighted objective of the target and OOD empirical risk. While its practical utility is limited, this does suggest that if we can detect OOD samples, then there may be ways to benefit from them. When we do not know which samples are OOD, we show how a number of go-to strategies such as data-augmentation, hyper-parameter optimization, and pre-training are not enough to ensure that the target generalization error does not deteriorate with the number of OOD samples in the dataset.
翻译:我们期望从相似任务中获取更多样本能够降低泛化误差,而来自分布外任务的样本增多则会加剧泛化误差。然而,本研究揭示了一个反直觉现象:目标任务的泛化误差关于分布外样本数量呈现非单调特性。随着分布外样本增加,目标任务的泛化误差先改善后恶化,存在一个性能拐点阈值。换言之,少量分布外训练数据具有显著价值。我们通过合成数据集上的Fisher线性判别分析以及计算机视觉基准(如MNIST、CIFAR-10、CINIC-10、PACS和DomainNet)上的深度网络实验,验证并剖析了这一现象。在已知样本属于分布外的理想设定下,我们证明可通过目标域与分布外经验风险的加权目标函数来利用这种非单调趋势。尽管其实际效用有限,但研究表明:若能检测分布外样本,则存在潜在途径从中获益。当无法识别分布外样本时,我们证实现有主流策略(如数据增强、超参数优化与预训练)并不足以确保目标泛化误差不随数据集中分布外样本数量增加而恶化。