In this work, we study the weighted empirical risk minimization (weighted ERM) schema, in which an additional data-dependent weight function is incorporated when the empirical risk function is being minimized. We show that under a general ``balanceable" Bernstein condition, one can design a weighted ERM estimator to achieve superior performance in certain sub-regions over the one obtained from standard ERM, and the superiority manifests itself through a data-dependent constant term in the error bound. These sub-regions correspond to large-margin ones in classification settings and low-variance ones in heteroscedastic regression settings, respectively. Our findings are supported by evidence from synthetic data experiments.
翻译:本文研究加权经验风险最小化(加权ERM)框架,该框架在最小化经验风险函数时引入了一个额外的数据依赖权重函数。我们证明,在一般的“可平衡”伯恩斯坦条件下,可以设计加权ERM估计器,使其在特定子区域上获得优于标准ERM估计器的性能,这种优越性通过误差界中一个数据依赖的常数项体现。这些子区域分别对应分类设置中的大边界区域以及异方差回归设置中的低方差区域。我们通过合成数据实验为上述结论提供了证据支持。