Maximin optimal cluster randomized designs for assessing treatment effect heterogeneity

Cluster randomized trials (CRTs) are studies where treatment is randomized at the cluster level but outcomes are typically collected at the individual level. When CRTs are employed in pragmatic settings, baseline population characteristics may moderate treatment effects, leading to what is known as heterogeneous treatment effects (HTEs). Pre-specified, hypothesis-driven HTE analyses in CRTs can enable an understanding of how interventions may impact subpopulation outcomes. While closed-form sample size formulas have recently been proposed, assuming known intracluster correlation coefficients (ICCs) for both the covariate and outcome, guidance on optimal cluster randomized designs to ensure maximum power with pre-specified HTE analyses has not yet been developed. We derive new design formulas to determine the cluster size and number of clusters to achieve the locally optimal design (LOD) that minimizes variance for estimating the HTE parameter given a budget constraint. Given the LODs are based on covariate and outcome-ICC values that are usually unknown, we further develop the maximin design for assessing HTE, identifying the combination of design resources that maximize the relative efficiency of the HTE analysis in the worst case scenario. In addition, given the analysis of the average treatment effect is often of primary interest, we also establish optimal designs to accommodate multiple objectives by combining considerations for studying both the average and heterogeneous treatment effects. We illustrate our methods using the context of the Kerala Diabetes Prevention Program CRT, and provide an R Shiny app to facilitate calculation of optimal designs under a wide range of design parameters.

翻译：整群随机试验（CRTs）是将处理在整群层面进行随机化，但结果通常在个体层面收集的研究。当CRTs应用于实用环境中时，基线人群特征可能会调节处理效应，导致所谓的异质性处理效应（HTEs）。在CRTs中进行预先指定、假设驱动的HTE分析，可以理解干预措施如何影响亚群结果。尽管近期已提出假设协方差和结果变量的组内相关系数（ICCs）已知的闭合样本量公式，但尚未开发出确保预先指定HTE分析具有最大统计效力的最优整群随机设计指南。我们推导出新的设计公式，以在预算约束下确定群规模和群数量，从而实现最小化HTE参数估计方差的局部最优设计（LOD）。由于LOD基于通常未知的协变量和结果ICC值，我们进一步开发用于评估HTE的最大最小设计，在最坏情况下识别使HTE分析相对效率最大化的设计资源组合。此外，鉴于平均处理效应分析通常为主要关注点，我们还通过结合平均和异质性处理效应的研究考虑，建立了适应多重目标的最优设计。我们以喀拉拉邦糖尿病预防计划CRT为背景说明方法，并提供一个R Shiny应用程序以促进在广泛设计参数下计算最优设计。