Multi-level modeling is an important approach for analyzing complex survey data using multi-stage sampling. However, estimation of multi-level models can be challenging when we combine several datasets with distinct hierarchies with sampling weights. This paper presents a method for combining multiple datasets with different hierarchical structures due to distinct informative sampling designs for the same survey. To develop an approach with complete generality, we propose to define a pseudo-cluster, a cluster containing only a singleton observation, to unify the data structure and thereby enable estimation of multi-level models incorporating sampling weights across the combined sample. We justify incorporating sampling weights at each level of the hierarchical model and in doing-so define a pseudo-likelihood estimation procedure. Simulation studies are used to illustrate the effect of incorporating sampling designs in this challenging multi-level modeling scenario. We demonstrate in the simulation studies that considering a linear mixed model with sampling weights provides unbiased estimates of model parameters and enhances the estimation of the variance components of the random effects. The proposed method is illustrated through a novel application from the National Survey of Healthcare Organizations and Systems that sought to determine which organizational characteristics or traits, as measured in the surveys, have the strongest average relationship to the percentage of depression and anxiety diagnoses in physician practices in the US.
翻译:多层建模是分析采用多阶段抽样的复杂调查数据的重要方法。然而,当合并多个具有不同层级结构和抽样权重的数据集时,多层模型的估计面临挑战。本文提出一种方法,用于合并因同一调查采用不同信息抽样设计而产生不同层级结构的多个数据集。为开发具有完全通用性的方法,我们定义伪聚类——一种仅包含单一观测值的聚类,通过统一数据结构,使得在合并样本中能够估计整合抽样权重的多层模型。我们论证了在分层模型的每一层级引入抽样权重的合理性,并据此定义了伪似然估计程序。通过模拟研究,我们展示了在这种具有挑战性的多层建模场景中纳入抽样设计的效果。模拟研究表明,采用含抽样权重的线性混合模型能够获得模型参数的无偏估计,并提升随机效应方差分量的估计精度。本文通过美国医疗组织与系统全国调查的新应用案例展示了所提方法,该调查旨在确定哪些通过问卷测度的组织特征与特质,与美国医生诊所中抑郁与焦虑诊断比例的平均关联最强。