We introduce prediction-powered inference $\unicode{x2013}$ a framework for performing valid statistical inference when an experimental data set is supplemented with predictions from a machine-learning system. Our framework yields provably valid conclusions without making any assumptions on the machine-learning algorithm that supplies the predictions. Higher accuracy of the predictions translates to smaller confidence intervals, permitting more powerful inference. Prediction-powered inference yields simple algorithms for computing valid confidence intervals for statistical objects such as means, quantiles, and linear and logistic regression coefficients. We demonstrate the benefits of prediction-powered inference with data sets from proteomics, genomics, electronic voting, remote sensing, census analysis, and ecology.
翻译:我们提出了预测驱动推断(Prediction-Powered Inference)——一个在实验数据集中融入机器学习系统预测结果时,用于进行有效统计推断的框架。该框架无需对提供预测的机器学习算法作任何假设,即可得到可证明有效的结论。预测准确度越高,置信区间越小,从而能够支持更具统计效力的推断。预测驱动推断提供了简单的算法,用于计算均值、分位数、线性回归系数和逻辑回归系数等统计对象的有效置信区间。我们通过蛋白质组学、基因组学、电子投票、遥感、人口普查分析和生态学等领域的实际数据集,展示了预测驱动推断的优势。