Prediction-based (PB) inference is increasingly used in applications where the outcome of interest is difficult to obtain, but its predictors are readily available. Unlike traditional inference, PB inference performs statistical inference using a partially observed outcome and a set of covariates by leveraging a prediction of the outcome generated from a machine learning (ML) model. Motwani and Witten (2023) recently revisited two innovative PB inference approaches for ordinary least squares. They found that the method proposed by Wang et al. (2020) yields a consistent estimator for the association of interest when the ML model perfectly captures the underlying regression function. Conversely, the prediction-powered inference (PPI) method proposed by Angelopoulos et al. (2023) yields valid inference regardless of the model's accuracy. In this paper, we study the statistical efficiency of the PPI estimator. Our analysis reveals that a more efficient estimator, proposed 25 years ago by Chen and Chen (2000), can be obtained by simply adding a weight to the PPI estimator. We also contextualize PB inference with methods from the economics and statistics literature dating back to the 1960s. Our extensive theoretical and numerical analyses indicate that the Chen and Chen (CC) estimator offers a balance between robustness to ML model specification and statistical efficiency, making it the preferred choice for use in practice.
翻译:基于预测的推断在应用中日益普及,尤其适用于目标变量难以直接获取但其预测变量易于获得的情形。与传统推断方法不同,基于预测的推断通过利用机器学习模型生成的预测结果,结合部分观测到的目标变量及一组协变量进行统计推断。Motwani与Witten(2023)近期重新审视了两种针对普通最小二乘法的创新性基于预测推断方法。他们发现,当机器学习模型完美捕捉底层回归函数时,Wang等人(2020)提出的方法能对目标关联产生一致估计量;反之,Angelopoulos等人(2023)提出的预测驱动推断方法则无论模型准确性如何都能提供有效推断。本文系统研究了PPI估计量的统计效率。分析表明,通过对PPI估计量简单添加权重即可获得Chen与Chen(2000)在25年前提出的更高效率估计量。我们还将基于预测的推断置于可追溯至1960年代的经济学与统计学文献方法体系中进行定位。大量理论与数值分析表明,Chen-Chen估计量在机器学习模型设定鲁棒性与统计效率之间实现了平衡,因而成为实践中的优选方案。