Calibrating hierarchical Bayesian domain inference for a proportion

Small area estimation (SAE) improves estimates for local communities or groups, such as counties, neighborhoods, or demographic subgroups, when data are insufficient for each area. This is important for targeting local resources and policies, especially when national-level or large-area data mask variation at a more granular level. Researchers often fit hierarchical Bayesian models to stabilize SAE when data are sparse. Ideally, Bayesian procedures also exhibit good frequentist properties, as demonstrated by calibrated Bayes metrics. However, hierarchical Bayesian models tend to shrink domain estimates toward the overall mean and may produce credible intervals that do not maintain nominal coverage. Hoff et al. developed the Frequentist, but Assisted by Bayes (FAB) intervals for subgroup estimates with normally distributed outcomes. However, non-normally distributed data present new challenges, and multiple types of intervals have been proposed for estimating proportions. We examine domain inference with binary outcomes and extend FAB intervals to improve nominal coverage. We describe how to numerically compute FAB intervals for a proportion and evaluate their performance through repeated simulation studies. Leveraging multilevel regression and poststratification (MRP), we further refine SAE to correct for sample selection bias, construct the FAB intervals for MRP estimates and assess their repeated sampling properties. Finally, we apply the proposed inference methods to estimate COVID-19 infection rates across geographic and demographic subgroups. We find that the FAB intervals improve nominal coverage, at the cost of wider intervals.

翻译：小区域估计（SAE）可在数据对各区域不足时，改进对地方社区或群体（如县、街区或人口亚组）的估计。这对于地方资源和政策的精准投放至关重要，尤其当国家级或大区域数据掩盖了更细粒度层面的差异时。在数据稀疏的情况下，研究者常采用分层贝叶斯模型以稳定SAE。理想情况下，贝叶斯过程还应具备良好的频率性质，如校准贝叶斯度量所展示的。然而，分层贝叶斯模型倾向于将域估计值向整体均值收缩，并可能产生无法维持名义覆盖率的可信区间。Hoff等人针对正态分布结果的亚组估计，开发了“频率主义但受贝叶斯辅助”（FAB）区间。然而，非正态分布数据带来了新的挑战，且已有多种区间被提出用于比例估计。我们研究了二元结果下的域推断，并扩展了FAB区间以提高名义覆盖率。我们描述了如何数值计算比例的FAB区间，并通过重复模拟研究评估其性能。利用多级回归与事后分层（MRP），我们进一步优化SAE以校正样本选择偏差，构建MRP估计的FAB区间并评估其重复抽样性质。最后，我们将所提出的推断方法应用于估计跨地理和人口亚组的COVID-19感染率。我们发现FAB区间以更宽的区间为代价，提高了名义覆盖率。