Recent advances in artificial intelligence have enabled the generation of large-scale, low-cost predictions with increasingly high fidelity. As a result, the primary challenge in statistical inference has shifted from data scarcity to data reliability. Prediction-powered inference methods seek to exploit such predictions to improve efficiency when labeled data are limited. However, existing approaches implicitly adopt a use-all philosophy, under which incorporating more predictions is presumed to improve inference. When prediction quality is heterogeneous, this assumption can fail, and indiscriminate use of unlabeled data may dilute informative signals and degrade inferential accuracy. In this paper, we propose Filtered Prediction-Powered Inference (FPPI), a framework that selectively incorporates predictions by identifying a data-adaptive filtered region in which predictions are informative for inference. We show that this region can be consistently estimated under a margin condition, achieving fast rates of convergence. By restricting the prediction-powered correction to the estimated filtered region, FPPI adaptively mitigates the impact of biased or noisy predictions. We establish that FPPI attains strictly improved asymptotic efficiency compared with existing prediction-powered inference methods. Numerical studies and a real-data application to large language model evaluation demonstrate that FPPI substantially reduces reliance on expensive labels by selectively leveraging reliable predictions, yielding accurate inference even in the presence of heterogeneous prediction quality.
翻译:近年来人工智能的进步使得大规模、低成本预测的生成保真度日益提高。因此,统计推断的主要挑战已从数据稀缺转向数据可靠性。预测驱动推断方法旨在利用此类预测,在标注数据有限时提升推断效率。然而,现有方法隐含地采用"全量使用"理念,即假定纳入更多预测总能改善推断。当预测质量存在异质性时,这一假设可能失效,不加区分地使用未标注数据可能稀释信息性信号并降低推断准确性。本文提出基于筛选的预测驱动推断框架,该框架通过识别数据自适应的筛选区域,选择性地纳入对推断具有信息价值的预测。我们证明在边界条件下该区域可被一致估计,并获得快速收敛速率。通过将预测驱动校正限制在估计的筛选区域内,FPPI能自适应地缓解有偏或噪声预测的影响。我们证明相较于现有预测驱动推断方法,FPPI能获得严格渐进有效的改进。数值模拟及面向大语言模型评估的实际数据应用表明,FPPI通过选择性利用可靠预测,显著降低对昂贵标注的依赖,即使在预测质量异质性的情况下仍能实现精确推断。