The simultaneous estimation of many parameters based on data collected from corresponding studies is a key research problem that has received renewed attention in the high-dimensional setting. Many practical situations involve heterogeneous data where heterogeneity is captured by a nuisance parameter. Effectively pooling information across samples while correctly accounting for heterogeneity presents a significant challenge in large-scale estimation problems. We address this issue by introducing the ``Nonparametric Empirical Bayes Structural Tweedie" (NEST) estimator, which efficiently estimates the unknown effect sizes and properly adjusts for heterogeneity via a generalized version of Tweedie's formula. For the normal means problem, NEST simultaneously handles the two main selection biases introduced by heterogeneity: one, the selection bias in the mean, which cannot be effectively corrected without also correcting for, two, selection bias in the variance. We develop theory to show that NEST is asymptotically as good as the optimal Bayes rule that uniquely minimizes a weighted squared error loss. In our simulation studies NEST outperforms competing methods, with much efficiency gains in many settings. The proposed method is demonstrated on estimating the batting averages of baseball players and Sharpe ratios of mutual fund returns. Extensions to other members of the two-parameter exponential family are discussed.
翻译:基于对应研究收集的数据同时估计多个参数是一个关键的研究问题,在高维背景下重新引起关注。许多实际情境涉及异质性数据,其中异质性由一个冗余参数刻画。在大规模估计问题中,如何有效整合样本信息并正确考虑异质性构成了重大挑战。我们通过引入“非参数经验贝叶斯结构式Tweedie”(NEST)估计器来解决这一问题,该估计器能够高效估计未知效应大小,并通过Tweedie公式的广义版本合理调整异质性。对于正态均值问题,NEST同时处理了异质性引入的两种主要选择偏差:一是均值的选择偏差,若不纠正此偏差则无法有效校正;二是方差的选择偏差。我们发展了理论,证明NEST渐近地达到与唯一最小化加权平方误差损失的最优贝叶斯规则相同的性能。在模拟研究中,NEST优于竞争方法,在许多设置中实现了显著效率提升。该方法通过估算棒球运动员的击球率和共同基金回报的夏普比率进行了实证。文章还讨论了向双参数指数族其他成员的扩展。