Despite the rapid development of machine learning algorithms for domain generalization (DG), there is no clear empirical evidence that the existing DG algorithms outperform the classic empirical risk minimization (ERM) across standard benchmarks. To better understand this phenomenon, we investigate whether there are benefits of DG algorithms over ERM through the lens of label noise. Specifically, our finite-sample analysis reveals that label noise exacerbates the effect of spurious correlations for ERM, undermining generalization. Conversely, we illustrate that DG algorithms exhibit implicit label-noise robustness during finite-sample training even when spurious correlation is present. Such desirable property helps mitigate spurious correlations and improve generalization in synthetic experiments. However, additional comprehensive experiments on real-world benchmark datasets indicate that label-noise robustness does not necessarily translate to better performance compared to ERM. We conjecture that the failure mode of ERM arising from spurious correlations may be less pronounced in practice.
翻译:尽管用于域泛化(DG)的机器学习算法发展迅速,但目前尚无明确经验证据表明现有DG算法在标准基准测试中优于经典的经验风险最小化(ERM)。为更好地理解这一现象,我们从标签噪声的角度探究DG算法是否比ERM更具优势。具体而言,我们的有限样本分析揭示,标签噪声会加剧ERM中的虚假相关效应,从而损害其泛化能力。相反,我们证明了DG算法在有限样本训练过程中表现出隐式的标签噪声鲁棒性,即便存在虚假相关时亦是如此。这种理想特性有助于缓解虚假相关效应,并在合成实验中提升泛化性能。然而,对真实世界基准数据集的额外综合实验表明,与ERM相比,标签噪声鲁棒性并不必然转化为更优的模型性能。我们推测,在实践中,由虚假相关引发的ERM失效模式可能并不显著。