Generalized estimating equations (GEE) are of great importance in analyzing clustered data without full specification of multivariate distributions. A recent approach jointly models the mean, variance, and correlation coefficients of clustered data through three sets of regressions (Luo and Pan, 2022). We observe that these estimating equations, however, are a special case of those of Yan and Fine (2004) which further allows the variance to depend on the mean through a variance function. The proposed variance estimators may be incorrect for the variance and correlation parameters because of a subtle dependence induced by the nested structure of the estimating equations. We characterize model settings where their variance estimation is invalid and show the variance estimators in Yan and Fine (2004) correctly account for such dependence. In addition, we introduce a novel model selection criterion that enables the simultaneous selection of the mean-scale-correlation model. The sandwich variance estimator and the proposed model selection criterion are tested by several simulation studies and real data analysis, which validate its effectiveness in variance estimation and model selection. Our work also extends the R package geepack with the flexibility to apply different working covariance matrices for the variance and correlation structures.
翻译:广义估计方程(GEE)在无需完全指定多元分布的情况下分析聚类数据中具有重要价值。近期一项研究通过三组回归模型联合建模聚类数据的均值、方差及相关系数(Luo and Pan, 2022)。然而我们观察到,这些估计方程实际上是Yan和Fine(2004)研究的一个特例,后者进一步允许方差通过方差函数依赖于均值。由于估计方程嵌套结构导致的微妙依赖关系,所提出的方差估计量可能对方差和相关参数估计有误。我们刻画了方差估计失效的模型设定条件,并证明Yan和Fine(2004)中的方差估计量能正确解释这种依赖关系。此外,我们引入一种新型模型选择准则,可同时实现均值-尺度-相关模型的选择。通过多项仿真实验和真实数据分析,验证了三明治方差估计量与所提模型选择准则在方差估计和模型选择中的有效性。我们的工作还扩展了R包geepack,使其能够灵活地为方差和相关结构应用不同的工作协方差矩阵。