Obtaining high-quality labels is costly, whereas unlabeled covariates are often abundant, motivating semi-supervised inference methods with reliable uncertainty quantification. Prediction-powered inference (PPI) leverages a machine-learning predictor trained on a small labeled sample to improve efficiency, but it can lose efficiency under model misspecification and suffer from coverage distortions due to label reuse. We introduce Machine-Learning-Assisted Generalized Entropy Calibration (MEC), a cross-fitted, calibration-weighted variant of PPI. MEC improves efficiency by reweighting labeled samples to better align with the target population, using a principled calibration framework based on Bregman projections. This yields robustness to affine transformations of the predictor and relaxes requirements for validity by replacing conditions on raw prediction error with weaker projection-error conditions. As a result, MEC attains the semiparametric efficiency bound under weaker assumptions than existing PPI variants. Across simulations and a real-data application, MEC achieves near-nominal coverage and tighter confidence intervals than CF-PPI and vanilla PPI.
翻译:获取高质量标签成本高昂,而无标签协变量通常非常丰富,这促使开发具有可靠不确定性量化的半监督推断方法。预测驱动推断(PPI)利用在少量标记样本上训练的机器学习预测器来提高效率,但在模型错误设定下可能降低效率,并因标签重复使用而导致覆盖失真。我们提出了一种基于机器学习的广义熵校准方法(MEC),这是PPI的一种交叉拟合、校准加权变体。MEC通过基于Bregman投影的原则性校准框架,对标记样本重新加权以更好地与目标人群对齐,从而提高效率。这使其对预测器的仿射变换具有鲁棒性,并通过将原始预测误差条件替换为更弱的投影误差条件,放宽了对有效性的要求。因此,MEC在比现有PPI变体更弱的假设下达到了半参数效率界。在模拟实验和实际数据应用中,MEC实现了接近名义覆盖率的置信区间,且区间窄于CF-PPI和标准PPI。