Empirical risk minimization (ERM) is a fundamental machine learning paradigm. However, its generalization ability is limited in various tasks. In this paper, we devise Dummy Risk Minimization (DuRM), a frustratingly easy and general technique to improve the generalization of ERM. DuRM is extremely simple to implement: just enlarging the dimension of the output logits and then optimizing using standard gradient descent. Moreover, we validate the efficacy of DuRM on both theoretical and empirical analysis. Theoretically, we show that DuRM derives greater variance of the gradient, which facilitates model generalization by observing better flat local minima. Empirically, we conduct evaluations of DuRM across different datasets, modalities, and network architectures on diverse tasks, including conventional classification, semantic segmentation, out-of-distribution generalization, adverserial training, and long-tailed recognition. Results demonstrate that DuRM could consistently improve the performance under all tasks with an almost free lunch manner. Furthermore, we show that DuRM is compatible with existing generalization techniques and we discuss possible limitations. We hope that DuRM could trigger new interest in the fundamental research on risk minimization.
翻译:经验风险最小化(ERM)是机器学习的基本范式,然而其泛化能力在各种任务中受到限制。本文提出了虚拟风险最小化(DuRM),这是一种极其简单且通用的技术,用于改善ERM的泛化性能。DuRM的实现极为简便:仅需放大输出逻辑层的维度,然后使用标准梯度下降进行优化。此外,我们从理论和实证分析两方面验证了DuRM的有效性。理论上,我们证明DuRM能产生更大的梯度方差,通过观测更平坦的局部最小值促进模型泛化。实证上,我们在多种任务(包括传统分类、语义分割、分布外泛化、对抗训练和长尾识别)中,跨越不同数据集、模态和网络架构对DuRM进行了评估。结果表明,DuRM能以近乎"免费午餐"的方式持续提升所有任务的性能。进一步地,我们证明DuRM与现有泛化技术兼容,并讨论了其潜在局限性。希望DuRM能激发风险最小化基础研究的新兴趣。