Generalized linear models are a popular tool in applied statistics, with their maximum likelihood estimators enjoying asymptotic Gaussianity and efficiency. As all models are wrong, it is desirable to understand these estimators' behaviours under model misspecification. We study semiparametric multilevel generalized linear models, where only the conditional mean of the response is taken to follow a specific parametric form. Pre-existing estimators from mixed effects models and generalized estimating equations require specificaiton of a conditional covariance, which when misspecified can result in inefficient estimates of fixed effects parameters. It is nevertheless often computationally attractive to consider a restricted, finite dimensional class of estimators, as these models naturally imply. We introduce sandwich regression, that selects the estimator of minimal variance within a parametric class of estimators over all distributions in the full semiparametric model. We demonstrate numerically on simulated and real data the attractive improvements our sandwich regression approach enjoys over classical mixed effects models and generalized estimating equations.
翻译:广义线性模型是应用统计学中的常用工具,其最大似然估计量具有渐近正态性和有效性。由于所有模型都存在设定误差,理解这些估计量在模型误设下的行为具有重要意义。本研究针对半参数多水平广义线性模型展开分析,其中仅假设响应变量的条件均值遵循特定参数形式。现有的混合效应模型与广义估计方程方法均需设定条件协方差结构,当该结构被误设时可能导致固定效应参数估计的效率损失。然而,考虑到计算可行性,如这些模型自然隐含的那样,采用有限维的受限估计量类别仍具有吸引力。我们提出的三明治回归方法,能够在完整半参数模型的所有分布中,从参数化估计量类别中选择具有最小方差的估计量。通过模拟数据与真实数据的数值实验,我们证明了三明治回归方法相较于经典混合效应模型和广义估计方程具有显著的优越性。