Bregman projection for calibration estimation

Calibration weighting is a fundamental technique in survey sampling and data integration for incorporating auxiliary information and improving efficiency of estimators. Classical calibration methods are typically formulated through distance functions applied to weight ratios relative to design weights. In this paper we develop a unified framework for calibration estimation based on Bregman divergence defined directly on the weight vector. We show that calibration estimators obtained from Bregman divergence admit a dual representation that depends only on the dimension of the auxiliary variables and can be interpreted as a Bregman projection onto the calibration constraint set. This geometric structure leads to a general asymptotic representation showing that calibration estimators are equivalent to debiased regression estimators whose regression coefficient depends on the choice of the Bregman generator. The result provides a unifying perspective on classical calibration methods such as quadratic calibration and exponential tilting, and reveals how the choice of divergence influences efficiency. Under Poisson sampling we further characterize the generator that minimizes the asymptotic variance of the calibration estimator and obtain an optimal contrast entropy divergence. The framework also extends naturally to settings where inclusion probabilities are unknown and must be estimated, yielding cross-fitted estimators that remain root-n consistent under mild conditions. Finally, we develop a regularized calibration estimator suitable for high-dimensional auxiliary variables. Simulation studies and a real data application illustrate the practical advantages of the proposed approach.

翻译：校准加权是调查抽样和数据整合中用于纳入辅助信息并提高估计量效率的基本技术。经典校准方法通常通过应用于设计权重相对比率的距离函数来构建。本文基于直接定义在权重向量上的布雷格曼散度，发展了一个统一的校准估计框架。我们证明，由布雷格曼散度得到的校准估计量具有一个对偶表示，该表示仅依赖于辅助变量的维数，并可解释为对校准约束集的布雷格曼投影。这一几何结构导出了一个通用渐近表示，表明校准估计量等价于去偏回归估计量，其回归系数取决于布雷格曼生成函数的选择。该结果为诸如二次校准和指数倾斜等经典校准方法提供了统一视角，并揭示了散度选择如何影响效率。在泊松抽样下，我们进一步刻画了使校准估计量渐近方差最小化的生成函数，并得到了最优对比熵散度。该框架还能自然扩展到包含概率未知且需估计的情形，从而得到在温和条件下保持根号n一致性的交叉拟合估计量。最后，我们开发了一种适用于高维辅助变量的正则化校准估计量。数值模拟和实际数据应用展示了所提方法的实际优势。