Linear mixed models (LMMs) are a popular class of methods for analyzing longitudinal and clustered data. However, such models can be sensitive to outliers, and this can lead to biased inference on model parameters and inaccurate prediction of random effects if the data are contaminated. We propose a new approach to robust estimation and inference for LMMs using a hierarchical gamma-divergence, which offers an automated, data-driven approach to downweight the effects of outliers occurring in both the error and the random effects, using normalized powered density weights. For estimation and inference, we develop a computationally scalable minorization-maximization algorithm for the resulting objective function, along with a clustered bootstrap method for uncertainty quantification and a Hyvarinen score criterion for selecting a tuning parameter controlling the degree of robustness. Under suitable regularity conditions, we show the resulting robust estimates can be asymptotically controlled even under a heavy level of (covariate-dependent) contamination. Simulation studies demonstrate hierarchical gamma-divergence consistently outperforms several currently available methods for robustifying LMMs. We also illustrate the proposed method using data from a multi-center AIDS cohort study.
翻译:线性混合模型(LMMs)是分析纵向数据和聚类数据的一类常用方法。然而,此类模型对异常值较为敏感,若数据受到污染,可能导致模型参数的推断出现偏差以及随机效应的预测不准确。我们提出一种利用层次化Gamma散度进行LMM稳健估计与推断的新方法,该方法通过归一化幂密度权重,提供了一种自动化、数据驱动的方式来降低误差项和随机效应中异常值的影响。为实现估计与推断,我们针对所得目标函数开发了一种计算可扩展的极小化-极大化算法,同时提出了一种用于不确定性量化的聚类自助法,以及一种用于选择控制稳健性程度的调节参数的Hyvarinen评分准则。在适当的正则性条件下,我们证明了即使在(协变量依赖的)严重污染水平下,所得稳健估计的渐近性仍可得到控制。模拟研究表明,层次化Gamma散度在稳健化LMM方面持续优于当前已有的多种方法。我们还通过一项多中心艾滋病队列研究的数据对所提方法进行了示例说明。