Unified implementation and comparison of Bayesian shrinkage methods for treatment effect estimation in subgroups

Evaluating treatment effect heterogeneity across patient subgroups is a fundamental aspect of clinical trial analysis. Yet, these analyses have inherent limitations due to small sample sizes and the substantial number of subgroups investigated. Statisticians in regulatory agencies and pharmaceutical companies have begun considering shrinkage methods grounded in Bayesian statistical theory. These methods incorporate priors on treatment effect heterogeneity, which operationally shrink raw subgroup treatment effect estimates towards the overall treatment effect. Various shrinkage estimators and priors have been proposed, yet it remains unclear which methods perform best. This work provides a unified presentation, software implementation (in the R package bonsaiforest2), and simulation comparison of one-way and global shrinkage methods for continuous, binary, count, and time-to-event endpoints. One-way models fit a separate shrinkage model for each subgrouping variable, whereas global models fit a model including all subgroup indicators at once. Both can derive standardized subgroup-specific treatment effects. Across all simulation scenarios, shrinkage methods outperformed the standard subgroup estimator without shrinkage in terms of mean squared error. They were also more efficient in identifying a non-efficacious subgroup. Global shrinkage models tended to have smaller mean squared error and less dependence on hyperprior parameters than one-way models, but also exhibited slightly larger bias and worse frequentist coverage of associated credible intervals. For both models, hyperprior choices anchored in trial assumptions about the anticipated size of the overall treatment effect performed well. We conclude that some degree of shrinkage is preferable to none and advocate for the routine inclusion of shrunken estimates in clinical forest plots to facilitate more robust decision-making.

翻译：评估不同患者亚组间的治疗效果异质性是临床试验分析的基本方面。然而，由于样本量小且研究的亚组数量众多，这些分析存在固有局限性。监管机构和制药公司的统计学家已开始考虑基于贝叶斯统计理论的收缩方法。这些方法对治疗效果异质性引入先验信息，实际操作中可将原始亚组治疗效果估计向总体治疗效果收缩。目前已提出多种收缩估计量和先验分布，但何种方法表现最优尚不明确。本研究针对连续型、二分类、计数型和时间至事件终点，对单向和全局收缩方法进行统一阐述、软件实现（基于R包bonsaiforest2）及模拟比较。单向模型为每个亚组变量拟合独立收缩模型，而全局模型则一次性纳入所有亚组指标进行建模。两种方法均可推导标准化亚组特异性治疗效果。在所有模拟场景中，收缩方法在均方误差方面均优于无收缩的标准亚组估计量，且在识别无效亚组方面效率更高。全局收缩模型相比单向模型具有更小的均方误差和对超先验参数的更弱依赖性，但偏差略大且相关可信区间的频率覆盖稍差。对于两种模型，基于试验对总体治疗效果预期规模假设的超先验选择均表现良好。我们得出结论：一定程度的收缩优于完全不收缩，并主张在临床森林图中常规纳入收缩估计值，以促进更稳健的决策制定。