Leveraging machine-learning methods to predict outcomes on some unlabeled datasets and then using these pseudo-outcomes in subsequent statistical inference is common in modern data analysis. Inference in this setting is often called post-prediction inference. We propose a novel, assumption-lean framework for inference under post-prediction setting, called \emph{Prediction De-Correlated inference} (PDC). Our approach can automatically adapt to any black-box machine-learning model and consistently outperforms supervised methods. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis support our theoretical results.
翻译:利用机器学习方法对未标注数据集进行结果预测,并将这些伪结果用于后续统计推断,在现代数据分析中已十分普遍。此类情境下的推断通常被称为"后预测推断"。本文提出了一种新颖的、低假设依赖的推断框架——"预测去相关推断"(PDC),专门适用于后预测场景。该方法能够自动适配任意黑箱机器学习模型,且性能始终优于监督式方法。PDC框架还具备易扩展性,可兼容多种预测模型。数值实验结果与实际数据分析均支持我们的理论结论。