We introduce prediction-powered inference $\unicode{x2013}$ a framework for performing valid statistical inference when an experimental data set is supplemented with predictions from a machine-learning system. Our framework yields provably valid conclusions without making any assumptions on the machine-learning algorithm that supplies the predictions. Higher accuracy of the predictions translates to smaller confidence intervals, permitting more powerful inference. Prediction-powered inference yields simple algorithms for computing valid confidence intervals for statistical objects such as means, quantiles, and linear and logistic regression coefficients. We demonstrate the benefits of prediction-powered inference with data sets from proteomics, genomics, electronic voting, remote sensing, census analysis, and ecology.
翻译:我们提出预测驱动的推断(Prediction-Powered Inference)—— 一种在实验数据集辅以机器学习系统预测时进行有效统计推断的框架。该框架无需对提供预测的机器学习算法做任何假设,即可得出可证明有效的结论。预测准确度越高,置信区间越小,从而实现更强大的推断能力。预测驱动的推断能够为均值、分位数、线性及逻辑回归系数等统计对象提供计算有效置信区间的简单算法。我们通过来自蛋白质组学、基因组学、电子投票、遥感、人口普查分析和生态学等领域的数据集,展示了预测驱动推断的优势。