Estimators in statistics and machine learning must typically trade off between efficiency, having low variance for a fixed target, and distributional robustness, such as multiaccuracy, or having low bias over a range of possible targets. In this paper, we consider a simple estimator, ridge boosting: starting with any initial predictor, perform a single boosting step with (kernel) ridge regression. Surprisingly, we show that ridge boosting simultaneously achieves both efficiency and distributional robustness: for target distribution shifts that lie within an RKHS unit ball, this estimator maintains low bias across all such shifts and has variance at the semiparametric efficiency bound for each target. In addition to bridging otherwise distinct research areas, this result has immediate practical value. Since ridge boosting uses only data from the source distribution, researchers can train a single model to obtain both robust and efficient estimates for multiple target estimands at the same time, eliminating the need to fit separate semiparametric efficient estimators for each target. We assess this approach through simulations and an application estimating the age profile of retirement income.
翻译:统计学和机器学习中的估计量通常需要在效率(对固定目标具有低方差)与分布稳健性(如多精度,或对一系列可能目标保持低偏差)之间进行权衡。本文考虑了一种简单估计量——岭提升法:以任意初始预测器为起点,通过(核)岭回归执行单步提升。令人惊讶的是,我们证明岭提升法能同时实现效率与分布稳健性:对于位于再生核希尔伯特空间单位球内的目标分布偏移,该估计量对所有此类偏移均保持低偏差,且对于每个目标具有半参数效率界下的方差。这一结果不仅弥合了原本不同的研究领域,还具有直接实际价值。由于岭提升法仅使用源分布数据,研究者可训练单一模型同时获得多个目标估计量的稳健高效估计,从而避免为每个目标分别拟合半参数有效估计量。我们通过数值模拟和退休收入年龄分布的应用实例评估了该方法。