We conducted a systematic comparison of statistical methods used for the analysis of time-to-event outcomes under various proportional and nonproportional hazard (NPH) scenarios. Our study used data from recently published oncology trials to compare the Log-rank test, still by far the most widely used option, against some available alternatives, including the MaxCombo test, the Restricted Mean Survival Time Difference (dRMST) test, the Generalized Gamma Model (GGM) and the Generalized F Model (GFM). Power, type I error rate, and time-dependent bias with respect to the RMST difference, survival probability difference, and median survival time were used to evaluate and compare the performance of these methods. In addition to the real data, we simulated three hypothetical scenarios with crossing hazards chosen so that the early and late effects 'cancel out' and used them to evaluate the ability of the aforementioned methods to detect time-specific and overall treatment effects. We implemented novel metrics for assessing the time-dependent bias in treatment effect estimates to provide a more comprehensive evaluation in NPH scenarios. Recommendations under each NPH scenario are provided by examining the type I error rate, power, and time-dependent bias associated with each statistical approach.
翻译:我们对各种比例风险与非比例风险(NPH)情形下用于分析时间-事件结局的统计方法进行了系统性比较。本研究基于近期发表的肿瘤学试验数据,将目前仍最广泛使用的对数秩检验与若干现有替代方法进行比较,包括MaxCombo检验、限制平均生存时间差(dRMST)检验、广义伽马模型(GGM)及广义F模型(GFM)。通过检验效能、I类错误率,以及针对RMST差值、生存概率差值与中位生存时间的时间依赖性偏倚,对这些方法的性能进行评估与比较。除真实数据外,我们模拟了三种风险函数交叉的假设情形,其设计使得早期与晚期效应相互"抵消",并以此评估上述方法检测特定时间点治疗效果与整体治疗效果的能力。我们采用新的度量指标来评估治疗效果估计中的时间依赖性偏倚,从而为NPH情形提供更全面的评估。通过考察各统计方法对应的I类错误率、检验效能及时间依赖性偏倚,我们针对不同NPH情形提出了相应的选用建议。