A key challenge of modern machine learning systems is to achieve Out-of-Distribution (OOD) generalization -- generalizing to target data whose distribution differs from that of source data. Despite its significant importance, the fundamental question of ``what are the most effective algorithms for OOD generalization'' remains open even under the standard setting of covariate shift. This paper addresses this fundamental question by proving that, surprisingly, classical Maximum Likelihood Estimation (MLE) purely using source data (without any modification) achieves the minimax optimality for covariate shift under the well-specified setting. That is, no algorithm performs better than MLE in this setting (up to a constant factor), justifying MLE is all you need. Our result holds for a very rich class of parametric models, and does not require any boundedness condition on the density ratio. We illustrate the wide applicability of our framework by instantiating it to three concrete examples -- linear regression, logistic regression, and phase retrieval. This paper further complement the study by proving that, under the misspecified setting, MLE is no longer the optimal choice, whereas Maximum Weighted Likelihood Estimator (MWLE) emerges as minimax optimal in certain scenarios.
翻译:现代机器学习系统面临的核心挑战是实现分布外泛化——即泛化到与源数据分布不同的目标数据。尽管这一问题至关重要,但即使在协变量转移的标准设定下,"哪些算法对分布外泛化最有效"这一基础问题仍未解决。本文通过证明一个令人惊讶的结论来回应这一基础问题:在完美设定条件下,仅使用源数据的经典极大似然估计(无需任何修改)即可实现协变量转移下的极小化最优性。这意味着在此设定下,没有任何算法(在常数因子范围内)的表现优于极大似然估计,从而证实了极大似然估计就是终极方法。该结论适用于极其丰富的参数模型族,且不要求密度比的任何有界性条件。我们通过三个具体实例——线性回归、逻辑回归和相位恢复——展示了该框架的广泛适用性。本文进一步通过证明补充了研究:在非完美设定条件下,极大似然估计不再是最优选择,而极大加权似然估计在某些场景下展现出极小化最优性。