Statistical inference with win statistics in cluster-randomized trials with composite outcomes

Win statistics have become increasingly popular for analyzing hierarchical composite endpoints in clinical trials, because they summarize treatment benefit through pairwise comparisons that respect the clinical importance order among outcome components. The win ratio, win odds, net benefit, and desirability of outcome ranking (DOOR) are all based on the same underlying pairwise comparison methodology and can complement one another to show the strength of the treatment effect. Despite recent progress on win statistics, statistical inference for win statistics in cluster randomized trials (CRTs) remains underdeveloped. In this paper, we provide a comprehensive survey of testing procedures for the win ratio, win odds, net benefit, and DOOR in parallel-arm CRTs with hierarchical composite outcomes. Then based on each win statistic, we compare different testing procedures, including Wald tests based on cluster rank sum statistics and bivariate clustered U-statistics, tests that use a cluster jackknife variance, a score permutation test, a permutation based procedure with analytical variance estimation, and likelihood ratio test derived from clustered jackknife estimates. Through simulation studies that consider varying scenarios such as different cluster sizes, intracluster correlations, and censoring-induced ties, we characterize the finite-sample type I error and power of each procedure across a range of practical settings with small and large numbers of clusters.We illustrate our methods by reanalyzing the Strategies to Reduce Injuries and Develop Confidence in Elders (STRIDE) pragmatic CRT, and implement all win statistics methods in the WinsCRT R package.

翻译：胜率统计量在临床试验中逐渐被广泛应用于分析层次性复合终点，因其通过尊重结局组分临床重要性顺序的成对比较来汇总治疗获益。胜率比、胜率优势、净获益及结局排序可取性（DOOR）均基于相同的底层成对比较方法，可相互补充以展示治疗效应强度。尽管胜率统计量近期取得进展，其在整群随机试验（CRTs）中的统计推断方法仍不成熟。本文系统综述了针对平行臂CRT层次性复合结局的胜率比、胜率优势、净获益及DOOR的检验流程。随后基于各胜率统计量，比较了不同检验方法：基于整群秩和统计量与双变量聚类U统计量的Wald检验、使用整群刀切方差的检验、得分置换检验、结合解析方差估计的置换法、以及基于聚类刀切估计的似然比检验。通过模拟研究（考虑不同整群规模、群内相关系数、删失导致的结值等场景），我们刻画了各方法在小样本与大样本实际场景下的有限样本第一类错误率与检验效能。通过重新分析减少老年人损伤与增强信心策略（STRIDE）实用性CRT案例展示方法应用，并将所有胜率统计量方法实现于WinsCRT R软件包。