Non-significant randomized control trials can hide subgroups of good responders to experimental drugs, thus hindering subsequent development. Identifying such heterogeneous treatment effects is key for precision medicine and many post-hoc analysis methods have been developed for that purpose. While several benchmarks have been carried out to identify the strengths and weaknesses of these methods, notably for binary and continuous endpoints, similar systematic empirical evaluation of subgroup analysis for time-to-event endpoints are lacking. This work aims to fill this gap by evaluating several subgroup analysis algorithms in the context of time-to-event outcomes, by means of three different research questions: Is there heterogeneity? What are the biomarkers responsible for such heterogeneity? Who are the good responders to treatment? In this context, we propose a new synthetic and semi-synthetic data generation process that allows one to explore a wide range of heterogeneity scenarios with precise control on the level of heterogeneity. We provide an open source Python package, available on Github, containing our generation process and our comprehensive benchmark framework. We hope this package will be useful to the research community for future investigations of heterogeneity of treatment effects and subgroup analysis methods benchmarking.
翻译:无显著性差异的随机对照试验可能隐藏对试验药物有良好响应的亚组,从而阻碍后续研发。识别此类异质性治疗效果是精准医学的关键,为此已开发出多种事后分析方法。尽管已有多个基准测试用于识别这些方法的优劣势(尤其针对二分类和连续型终点指标),但在时间至事件终点指标方面,亚组分析的系统性经验评估仍显不足。本研究旨在通过三类研究问题填补这一空白:是否存在异质性?哪些生物标志物导致了这种异质性?哪些患者对治疗有良好响应?我们据此提出一种新型合成与半合成数据生成流程,可精确控制异质性水平,探索广泛的异质性场景。我们提供开源的Python工具包(托管于Github),包含数据生成流程及完整基准测试框架。期待该工具包能助力研究界未来探究治疗效果异质性与亚组分析方法评估。