In modern data analysis, it is common to use machine learning methods to predict outcomes on unlabeled datasets and then use these pseudo-outcomes in subsequent statistical inference. Inference in this setting is often called post-prediction inference. We propose a novel assumption-lean framework for statistical inference under post-prediction setting, called Prediction De-Correlated Inference (PDC). Our approach is safe, in the sense that PDC can automatically adapt to any black-box machine-learning model and consistently outperform the supervised counterparts. The PDC framework also offers easy extensibility for accommodating multiple predictive models. Both numerical results and real-world data analysis demonstrate the superiority of PDC over the state-of-the-art methods.
翻译:在现代数据分析中,常使用机器学习方法预测未标记数据集的结果,然后将这些伪结果用于后续统计推断。这种情境下的推断通常被称为后预测推断。我们提出了一种新颖的假设宽松框架,用于后预测设定下的统计推断,称为预测去相关推断(PDC)。我们的方法是安全的,因为PDC能够自动适应任何黑盒机器学习模型,并始终优于有监督的对应方法。PDC框架还具备良好的可扩展性,能够轻松容纳多个预测模型。数值结果和真实世界数据分析均表明,PDC优于现有最先进的方法。