Embedded Multilevel Regression and Poststratification: Model-based Inference with Incomplete Auxiliary Information

Health disparity research often evaluates health outcomes across demographic subgroups. Multilevel regression and poststratification (MRP) is a popular approach for small subgroup estimation due to its ability to stabilize estimates by fitting multilevel models and to adjust for selection bias by poststratifying on auxiliary variables, which are population characteristics predictive of the analytic outcome. However, the granularity and quality of the estimates produced by MRP are limited by the availability of the auxiliary variables' joint distribution; data analysts often only have access to the marginal distributions. To overcome this limitation, we embed the estimation of population cell counts needed for poststratification into the MRP workflow: embedded MRP (EMRP). Under EMRP, we generate synthetic populations of the auxiliary variables before implementing MRP. All sources of estimation uncertainty are propagated with a fully Bayesian framework. Through simulation studies, we compare different methods and demonstrate EMRP's improvements over alternatives on the bias-variance tradeoff to yield valid subpopulation inferences of interest. As an illustration, we apply EMRP to the Longitudinal Survey of Wellbeing and estimate food insecurity prevalence among vulnerable groups in New York City. We find that all EMRP estimators can correct for the bias in classical MRP while maintaining lower standard errors and narrower confidence intervals than directly imputing with the WFPBB and design-based estimates. Performances from the EMRP estimators do not differ substantially from each other, though we would generally recommend the WFPBB-MRP for its consistently high coverage rates.

翻译：健康差异研究常评估不同人口亚组的健康结果。多层次回归与事后分层（MRP）因能够通过拟合多层次模型稳定估计值，并通过基于辅助变量（即预测分析结果的人口特征）进行事后分层调整选择偏倚，而成为小亚组估计的常用方法。然而，MRP估计的粒度与质量受限于辅助变量联合分布的可用性；数据分析人员通常仅能获取其边缘分布。为突破这一局限，我们将事后分层所需的人口单元计数估计嵌入MRP流程中：嵌入式MRP（EMRP）。在EMRP框架下，我们首先生成辅助变量的合成人口数据，再实施MRP。所有估计不确定性通过全贝叶斯框架传播。通过模拟研究，我们比较了不同方法，并证明EMRP在偏差-方差权衡上优于其他替代方法，可得到有效的目标子群体推断。作为示例，我们将EMRP应用于幸福感纵向调查，估计纽约市脆弱群体的食品不安全患病率。研究发现，所有EMRP估计量均能纠正经典MRP的偏差，同时相较于直接使用WFPBB插补和基于设计的估计，其标准误更低、置信区间更窄。不同EMRP估计量的表现无显著差异，但鉴于其持续较高的覆盖率，我们通常推荐使用WFPBB-MRP。