Leveraging machine-learning methods to predict outcomes on some unlabeled datasets and then using these pseudo-outcomes in subsequent statistical inference is common in modern data analysis. Inference in this setting is often called post-prediction inference. We propose a novel, assumption-lean framework for inference under post-prediction setting, called \emph{Prediction De-Correlated inference} (PDC). Our approach can automatically adapt to any black-box machine-learning model and consistently outperforms supervised methods. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis support our theoretical results.
翻译:利用机器学习方法对未标记数据集预测结果,并将这些伪结果用于后续统计推断,是现代数据分析中的常见做法。这种情境下的推断通常被称为后预测推断。我们提出了一种新颖的、假设精简的后预测推断框架,称为预测去相关推断(PDC)。该方法能够自动适应任意黑箱机器学习模型,且始终优于有监督方法。PDC框架还具备良好的扩展性,可容纳多个预测模型。数值结果和实际数据分析均支持我们的理论结论。