Prediction-powered inference (PPI) is a recent framework for valid statistical inference with partially labeled data, combining model-based predictions on a large unlabeled set with bias correction from a smaller labeled subset. Building on existing PPI results under covariate shift, we show that PPI rectification admits a direct design-based interpretation, and that informative labeling can be handled naturally by Horvitz--Thompson and Hájek-style corrections. This connection unites design-based survey sampling ideas with modern prediction-assisted inference, yielding estimators that remain valid when labeling probabilities vary across units. We consider the common setting where the inclusion probabilities are not known but estimated from a correctly specified model. In simulations, the performance of IPW-adjusted PPI with estimated propensities closely matches the known-probability case, retaining both nominal coverage and the variance-reduction benefits of PPI.
翻译:预测驱动推断(Prediction-powered inference, PPI)是一种针对部分标注数据进行有效统计推断的新框架,它结合了基于大型未标注集的模型预测与来自较小标注子集的偏差校正。基于协变量偏移下已有的PPI结果,我们证明PPI校正具有直接的设计基解释,且信息性标注可通过Horvitz–Thompson和Hájek风格校正自然处理。这一联系将基于设计的调查抽样思想与现代预测辅助推断统一起来,产生的估计量在标注概率随单元变化时仍保持有效性。我们考虑了包含概率未知但可通过正确指定模型进行估计的常见场景。在模拟实验中,使用估计倾向性的IPW调整PPI性能与已知概率情况紧密匹配,既保留了名义覆盖水平,也维持了PPI的方差缩减优势。