We study methods for simultaneous analysis of many noisy and biased estimates, each paired with an even noisier estimate of its own bias. The analyst's goal is to construct short calibrated intervals for each parameter. The standard debiasing approach, which subtracts the bias estimate from each biased estimate, inflates variance and yields long intervals. In this paper, we propose an empirical Bayes rebiasing strategy that starts from the fully debiased estimates and learns from data how much bias to reintroduce by estimating the unknown bias distribution. We provide convergence rates for the coverage of our intervals when the bias distribution is estimated using nonparametric maximum likelihood. Furthermore, we demonstrate substantial precision gains in prediction-powered inference, including pairwise LLM win-rate evaluations, as well as for inference of direct genetic effects in family-based GWAS.
翻译:本文研究同时分析多个含噪且有偏估计的方法,每个估计均配有其自身偏差的更高噪声估计。分析者的目标是为每个参数构建短校准区间。标准去偏方法通过从每个有偏估计中减去偏差估计,会导致方差膨胀并产生长区间。本文提出一种经验贝叶斯再偏置策略,该策略从完全去偏的估计出发,通过估计未知偏差分布来学习数据中应重新引入多少偏差。当使用非参数最大似然法估计偏差分布时,我们给出了区间覆盖率的收敛速率。此外,我们在预测驱动推断中(包括成对的大语言模型胜率评估)以及在基于家系的GWAS中直接遗传效应推断中,展示了显著的精度提升。