A Researcher's Guide to Empirical Risk Minimization

This guide provides a reference for high-probability regret bounds in empirical risk minimization (ERM). The presentation is modular: we begin with intuition and general proof strategies, then state broadly applicable guarantees under high-level conditions and provide tools for verifying them for specific losses and function classes. We emphasize that many ERM rate derivations can be organized around a three-step recipe -- a basic inequality, a uniform local concentration bound, and a fixed-point argument -- which yields regret bounds in terms of a critical radius, defined via localized Rademacher complexity, under a mild Bernstein-type variance-risk condition. To make these bounds concrete, we upper bound the critical radius using local maximal inequalities and metric-entropy integrals, thereby recovering familiar rates for VC-subgraph, Sobolev/Hölder, and bounded-variation classes. We also study ERM with nuisance components -- including weighted ERM and Neyman-orthogonal losses -- as they arise in causal inference, missing data, and domain adaptation. Following the orthogonal statistical learning framework, we highlight that these problems often admit regret-transfer bounds linking regret under an estimated loss to population regret under the target loss. These bounds typically decompose the regret into (i) statistical error under the estimated loss and (ii) approximation error due to nuisance estimation. Under sample splitting or cross-fitting, the first term can be controlled using standard fixed-loss ERM regret bounds, while the second depends only on nuisance-estimation accuracy. As a novel contribution, we also treat the in-sample regime, in which the nuisances and the ERM are fit on the same data, deriving regret bounds and showing that fast oracle rates remain attainable under suitable smoothness and Donsker-type conditions.

翻译：本指南为经验风险最小化（ERM）的高概率遗憾界提供了参考。论述采用模块化结构：我们从直观理解与一般证明策略入手，随后阐述高层次条件下广泛适用的保证性结论，并提供针对特定损失函数与函数类进行验证的工具。我们强调，许多ERM收敛速度的推导可围绕一个三步框架组织——基本不等式、一致局部集中界与不动点论证——该框架在温和的伯恩斯坦型方差-风险条件下，通过局部化Rademacher复杂度定义临界半径，进而得到以该临界半径表示的遗憾界。为使这些界限具体化，我们利用局部极大值不等式与度量熵积分对临界半径进行上界估计，从而恢复了VC子图类、Sobolev/Hölder类以及有界变差类中常见的收敛速度。我们还研究了包含干扰成分的ERM——包括加权ERM与Neyman正交损失函数——这类问题常见于因果推断、缺失数据与领域自适应场景。遵循正交统计学习框架，我们指出此类问题通常允许建立遗憾转移界，将估计损失下的遗憾与目标损失下的总体遗憾相关联。这些界限通常将遗憾分解为：（i）估计损失下的统计误差，以及（ii）由干扰估计引起的近似误差。在样本分割或交叉拟合条件下，第一项可通过标准固定损失ERM遗憾界进行控制，而第二项仅取决于干扰估计的精度。作为新颖贡献，我们还处理了样本内拟合机制（干扰项与ERM基于相同数据拟合）的情形，推导了相应的遗憾界，并证明在适当的平滑性与Donsker型条件下，仍可获得快速的oracle收敛速度。