We devise survey-weighted pseudo posterior distribution estimators under two-stage informative sampling of both primary clusters and secondary nested units for a one-way analysis of variance (ANOVA) population generating model as a simple canonical case where population model random effects are defined to be coincident with the primary clusters, for example student performance based on a survey of schools and students such as the 2000 OECD Programme for International Student Assessment (PISA). We consider estimation on an observed informative sample under both an augmented pseudo likelihood that co-samples the random effects, as well as an integrated likelihood that marginalizes out the random effects from the survey-weighted augmented pseudo likelihood. This paper includes a theoretical exposition that enumerates easily verified conditions for which estimation under the augmented pseudo posterior is guaranteed to be consistent at the true generating parameters. We reveal in simulation that both approaches produce asymptotically unbiased estimation of the generating hyperparameters for the random effects when a key condition on the sum of within cluster weighted residuals is met. We present a comparison with two frequentist alternatives, an expectation-maximization approach and a composite likelihood method that requires pairwise sampling weights.
翻译:我们针对两阶段信息性抽样(包括初级群集和次级嵌套单元)设计了一种基于调查加权的伪后验分布估计方法,用于单向方差分析(ANOVA)总体生成模型。该模型作为一个简单的典型情形,其中总体模型随机效应被定义为与初级群集一致——例如,基于2000年OECD国际学生评估项目(PISA)对学校和学生调查所得的学生表现数据。我们在观测到的信息性样本上考虑两种估计途径:一是采用增广伪似然函数,将随机效应共抽样纳入模型;二是采用积分似然函数,通过从调查加权增广伪似然中边缘化随机效应。本文的理论阐述列举了易于验证的条件,在这些条件下,基于增广伪后验的估计能够保证在真实生成参数处具有一致性。模拟研究表明,当群集内加权残差之和满足关键条件时,两种方法均能对随机效应的生成超参数产生渐近无偏估计。我们还将这两种方法与两种频率学派替代方法进行了比较:期望最大化方法和需要成对抽样权重的复合似然方法。