Evaluating tests for cluster-randomized trials with few clusters under generalized linear mixed models with covariate adjustment: a simulation study

Generalized linear mixed models (GLMM) are commonly used to analyze clustered data, but when the number of clusters is small to moderate, standard statistical tests may produce elevated type I error rates. Small-sample corrections have been proposed for continuous or binary outcomes without covariate adjustment. However, appropriate tests to use for count outcomes or under covariate-adjusted models remains unknown. An important setting in which this issue arises is in cluster-randomized trials (CRTs). Because many CRTs have just a few clusters (e.g., clinics or health systems), covariate adjustment is particularly critical to address potential chance imbalance and/or low power (e.g., adjustment following stratified randomization or for the baseline value of the outcome). We conducted simulations to evaluate GLMM-based tests of the treatment effect that account for the small (10) or moderate (20) number of clusters under a parallel-group CRT setting across scenarios of covariate adjustment (including adjustment for one or more person-level or cluster-level covariates) for both binary and count outcomes. We find that when the intraclass correlation is non-negligible ($\geq 0.01$) and the number of covariates is small ($\leq 2$), likelihood ratio tests with a between-within denominator degree of freedom have type I error rates close to the nominal level. When the number of covariates is moderate ($\geq 5$), across our simulation scenarios, the relative performance of the tests varied considerably and no method performed uniformly well. Therefore, we recommend adjusting for no more than a few covariates and using likelihood ratio tests with a between-within denominator degree of freedom.

翻译：广义线性混合模型（GLMM）常用于分析聚类数据，但当簇群数量较少或中等时，标准统计检验可能导致第一类错误率升高。针对无协变量调整的连续或二元结局变量，已有学者提出小样本校正方法。然而，适用于计数结局变量或协变量调整模型下的恰当检验方法仍属未知。这一问题在群组随机试验（CRTs）中尤为突出——由于许多CRT仅包含少数簇群（如诊所或卫生系统），协变量调整对于应对潜在的不平衡性和/或低统计效力（例如分层随机化后的调整或对结局基线值的调整）至关重要。我们通过模拟研究，在平行组CRT场景下评估了基于GLMM的治疗效应检验方法。研究考虑了小规模（10个）或中等规模（20个）簇群数量，并涵盖二元结局与计数结局变量在协变量调整（包括调整一个或多个个体水平或簇群水平协变量）下的多种情境。研究发现：当组内相关系数不可忽略（$\geq 0.01$）且协变量数量较少（$\leq 2$）时，采用组间-组内分母自由度的似然比检验的第一类错误率接近名义水平；当协变量数量中等（$\geq 5$）时，各模拟场景下检验方法的相对表现差异显著，未发现任何方法具有普适优越性。因此，我们建议调整的协变量不超过三个，并优先采用组间-组内分母自由度的似然比检验。