Robust modestly weighted log-rank tests

The introduction of checkpoint inhibitors in immuno-oncology has raised questions about the suitability of the log-rank test as the default primary analysis method in confirmatory studies, particularly when survival curves exhibit non-proportional hazards. The log-rank test, while effective in controlling false positive rates, may lose power in scenarios where survival curves remain similar for extended periods before diverging. To address this, various weighted versions of the log-rank test have been proposed, including the MaxCombo test, which combines multiple weighted log-rank statistics to enhance power across a range of alternative hypotheses. Despite its potential, the MaxCombo test has seen limited adoption, possibly owing to its proneness to produce counterintuitive results in situations where the hazard functions on the two arms cross. In response, the modestly weighted log-rank test was developed to provide a balanced approach, giving greater weight to later event times while avoiding undue influence from early detrimental effects. However, this test also faces limitations, particularly if the possibility of early separation of survival curves cannot be ruled out a priori. We propose a novel test statistic that integrates the strengths of the standard log-rank test, the modestly weighted log-rank test, and the MaxCombo test. By considering the maximum of the standard log-rank statistic and a modestly weighted log-rank statistic, the new test aims to maintain power under delayed effect scenarios while minimizing power loss, relative to the log-rank test, in worst-case scenarios. Simulation studies and a case study demonstrate the efficiency and robustness of this approach, highlighting its potential as a robust alternative for primary analysis in immuno-oncology trials.

翻译：免疫肿瘤学中检查点抑制剂的引入引发了对数秩检验作为确证性研究默认主要分析方法的适用性质疑，尤其是在生存曲线呈现非比例风险的情况下。对数秩检验虽然在控制假阳性率方面有效，但在生存曲线长时间保持相似后才出现分化的场景中可能丧失检验效能。为解决这一问题，已提出多种加权版本的对数秩检验，包括MaxCombo检验——该方法通过组合多个加权对数秩统计量以提升在多种备择假设下的检验效能。尽管具有潜力，MaxCombo检验的应用仍有限，可能归因于其在两组风险函数交叉的情况下易产生反直觉结果。为此，适度加权对数秩检验被开发出来，以提供一种平衡方法，赋予较晚事件时间更大权重，同时避免早期不利效应的过度影响。然而，该检验也存在局限性，特别是当无法先验排除生存曲线早期分离的可能性时。我们提出了一种新的检验统计量，它整合了标准对数秩检验、适度加权对数秩检验和MaxCombo检验的优势。通过考虑标准对数秩统计量与适度加权对数秩统计量的最大值，新检验旨在保持延迟效应场景下的检验效能，同时在最坏情况下相对于对数秩检验最小化效能损失。模拟研究和案例研究证明了该方法的效率与稳健性，凸显了其作为免疫肿瘤学试验主要分析方法的稳健替代方案的潜力。