Non-significant randomized control trials can hide subgroups of good responders to experimental drugs, thus hindering subsequent development. Identifying such heterogeneous treatment effects is key for precision medicine and many post-hoc analysis methods have been developed for that purpose. While several benchmarks have been carried out to identify the strengths and weaknesses of these methods, notably for binary and continuous endpoints, similar systematic empirical evaluation of subgroup analysis for time-to-event endpoints are lacking. This work aims to fill this gap by evaluating several subgroup analysis algorithms in the context of time-to-event outcomes, by means of three different research questions: Is there heterogeneity? What are the biomarkers responsible for such heterogeneity? Who are the good responders to treatment? In this context, we propose a new synthetic and semi-synthetic data generation process that allows one to explore a wide range of heterogeneity scenarios with precise control on the level of heterogeneity. We provide an open source Python package, available on Github, containing our generation process and our comprehensive benchmark framework. We hope this package will be useful to the research community for future investigations of heterogeneity of treatment effects and subgroup analysis methods benchmarking.
翻译:非显著的随机对照试验可能隐藏实验药物的良好应答者亚组,从而阻碍后续开发。识别此类异质性治疗效果是精准医学的关键,为此已开发出多种事后分析方法。尽管已有若干基准实验用于识别这些方法的优缺点(尤其是针对二元和连续终点),但针对时间至事件终点的亚组分析缺乏类似的系统性实证评估。本研究旨在填补这一空白,通过三个不同研究问题评估时间至事件结局场景下的多种亚组分析算法:是否存在异质性?导致异质性的生物标志物是什么?谁是治疗的良好应答者?为此,我们提出一种新的合成与半合成数据生成过程,可精确控制异质程度以探索广泛的异质性场景。我们提供开源的Python软件包(代码托管于GitHub),其中包含数据生成过程及全面的基准测试框架。希望该软件包能助力研究社区未来开展治疗效果异质性及亚组分析方法基准测试的探索。