Jeffreys-Type Penalized GEE for Correlated Binary Data with an Odds-Ratio Parameterization

Generalized estimating equations (GEE) are widely used for population-averaged inference on correlated binary responses, but ordinary GEE can fail under separation, a situation that is more likely in small-sample, sparse, or rare-event settings, leading to nonconvergence, infinite or extreme estimates, and unreliable inference. Existing penalized GEE (PGEE) approaches mitigate some of these problems but do not generally guarantee finite estimates under nonindependence working structures and often rely on correlation-coefficient parameterizations whose admissible range shrinks as fitted probabilities approach zero or one, forcing the working association toward independence under separation. We propose a PGEE framework that combines a Jeffreys-prior penalty with marginalized odds-ratio working parameterizations. The odds-ratio parameterization avoids this failure, while the penalty, with tunable strength $δ$ and default $δ= 1/2$, stabilizes estimation under separation. Under working independence, PGEE reduces to the Jeffreys-prior penalized maximum-likelihood estimator, yielding finite estimates for logit, probit, complementary log-log, and cauchit links. Under nonindependence odds-ratio structures, where a formal finiteness guarantee is unavailable, PGEE achieves near-complete empirical convergence even in separated settings. We also propose one-step and hybrid variants, OPGEE and HPGEE, that reduce computational cost. Simulations show that all three variants substantially outperform ordinary GEE under separation while retaining the performance of ordinary GEE in regular settings. We illustrate the method using a respiratory-illness trial in which ordinary GEE fails, and provide an implementation in the R package geer.

翻译：广义估计方程（GEE）广泛用于相关二元响应数据的总体平均推断，但普通GEE在分离情况下可能失效——该情形更易发生于小样本、稀疏数据或罕见事件场景，导致算法不收敛、估计值无穷大或极端化以及推断不可靠。现有惩罚GEE（PGEE）方法能缓解部分问题，但通常无法保证非独立工作结构下估计值的有限性，且常依赖于相关系数参数化——当拟合概率趋近0或1时，这类参数化的可行域会收缩，迫使工作关联结构在分离情况下趋近独立性。本文提出一种PGEE框架，将Jeffreys先验惩罚与边缘化比值比工作参数化相结合。比值比参数化避免了上述失效问题，而具有可调强度δ（默认δ=1/2）的惩罚项能在分离条件下稳定估计。在工作独立假设下，PGEE退化为Jeffreys先验惩罚最大似然估计，为logit、probit、互补双对数及cauchit链接函数提供有限估计值。在非独立比值比结构下（此时无法从理论上保证估计有限性），PGEE即使在分离场景中也能实现近乎完全的实证收敛。我们还提出单步变体（OPGEE）和混合变体（HPGEE）以降低计算成本。模拟实验表明，三种变体在分离情况下显著优于普通GEE，同时在常规场景中保持与普通GEE相当的性能。我们通过一项普通GEE失效的呼吸道疾病试验演示该方法，并在R包geer中提供实现。