Generalized estimating equations (GEE) are widely used for correlated data, but with small to moderate numbers of independent clusters the ordinary GEE regression estimators can be substantially biased. We develop a first-order bias-reduction principle for GEE by viewing the estimator as a clustered-data $M$-estimator and deriving an adjustment to the estimating equations that targets the leading bias term while accounting for the dependence of the working covariance on the mean parameters. The resulting class includes three bias-reduced estimators and three one-step bias-corrected analogs, nesting the bias-corrected estimator of Lunardon and Scharfstein (2017) and the bias-reduced and bias-corrected estimators of Paul and Zhang (2014) as special cases. The framework applies to general response types through correlation-coefficient parameterizations for the association structure and extends to correlated binary data through pairwise odds-ratio parameterizations, yielding the first bias-reduced and bias-corrected GEE estimators under this parameterization, for which the marginal-mean compatibility constraints are far less restrictive than those of correlation-coefficient parameterizations, making them better suited for small-sample settings. Under standard regularity conditions, all six estimators share the same asymptotic distribution as the ordinary GEE. Simulation studies show that the proposed estimators reduce bias while maintaining efficiency and coverage close to those of ordinary GEE across a range of settings, and a clinical trial analysis illustrates the proposed estimators in practice. Software is available in the R package geer.
翻译:广义估计方程(GEE)广泛用于处理相关性数据,但当独立聚类数量较小或适中时,普通GEE回归估计量可能存在显著偏差。我们通过将估计量视为聚类数据$M$估计量,并推导出针对主导偏差项的估计方程调整方法,同时考虑工作协方差对均值参数的依赖性,建立了GEE的一阶偏差降低原理。由此产生的类别包含三个偏差降低估计量和三个单步偏差校正类比,将Lunardon和Scharfstein(2017)的偏差校正估计量以及Paul和Zhang(2014)的偏差降低与偏差校正估计量作为特例。该框架通过关联结构的相关系数参数化适用于一般响应类型,并通过配对优势比参数化扩展至相关二元数据,在此参数化下首次得到偏差降低与偏差校正的GEE估计量;该参数化的边际均值兼容约束远弱于相关系数参数化,使其更适用于小样本场景。在标准正则性条件下,所有六个估计量与普通GEE具有相同的渐近分布。模拟研究表明,所提估计量在多种设置下降低了偏差,同时保持与普通GEE相近的效率和覆盖概率,临床试验分析则展示了所提估计量的实际应用。相关软件已在R包geer中提供。