Estimators in statistics and machine learning must typically trade off between efficiency, having low variance for a fixed target, and distributional robustness, such as \textit{multiaccuracy}, or having low bias over a range of possible targets. In this paper, we consider a simple estimator, \emph{ridge boosting}: starting with any initial predictor, perform a single boosting step with (kernel) ridge regression. Surprisingly, we show that ridge boosting simultaneously achieves both efficiency and distributional robustness: for target distribution shifts that lie within an RKHS unit ball, this estimator maintains low bias across all such shifts and has variance at the semiparametric efficiency bound for each target. In addition to bridging otherwise distinct research areas, this result has immediate practical value. Since ridge boosting uses only data from the source distribution, researchers can train a single model to obtain both robust and efficient estimates for multiple target estimands at the same time, eliminating the need to fit separate semiparametric efficient estimators for each target. We assess this approach through simulations and an application estimating the age profile of retirement income.
翻译:统计学与机器学习中的估计器通常需要在效率(即针对固定目标具有低方差)与分布鲁棒性(例如\textit{多准确性},即在多种可能目标范围内保持低偏差)之间进行权衡。本文研究一种简单估计器——\emph{岭提升}:从任意初始预测器出发,执行单步(核)岭回归提升。令人惊讶的是,我们证明岭提升能同时实现效率与分布鲁棒性:对于位于再生核希尔伯特空间单位球内的目标分布偏移,该估计器对所有此类偏移均保持低偏差,且对每个目标的方差均达到半参数效率界。这一结果不仅连接了原本独立的研究领域,更具有直接实用价值。由于岭提升仅使用源分布数据,研究者可训练单一模型同时获得多个目标估计量的鲁棒且高效的估计,无需为每个目标单独拟合半参数有效估计器。我们通过仿真实验和退休收入年龄分布估计的实证应用评估了该方法的性能。