Precision oncology aims to prescribe the optimal cancer treatment to the right patients, maximizing therapeutic benefits. However, identifying patient subgroups that may benefit more from experimental cancer treatments based on randomized clinical trials presents a significant analytical challenge. To address this, we introduce a novel unsupervised machine learning approach based on very dense random survival forests (up to 100,000 trees), equipped with a new splitting rule that explicitly targets treatment-effect heterogeneity. This method is robust, interpretable, and effectively identifies responsive subgroups. Extensive simulations confirm its ability to detect heterogeneous patient responses and distinguish between datasets with and without heterogeneity, while maintaining a stringent Type I error rate of 1%. We further validate its performance using Phase III randomized clinical trial datasets, demonstrating significant patient heterogeneity in treatment response based on baseline characteristics.
翻译:精准肿瘤学旨在为合适的患者提供最优的癌症治疗方案,以最大化治疗获益。然而,基于随机临床试验识别可能从实验性癌症治疗中获益更多的患者亚组,构成了一个重大的分析挑战。为此,我们提出了一种基于极密集随机生存森林(多达10万棵树)的新型无监督机器学习方法,该方法配备了一种新的分裂规则,明确以处理效应异质性为目标。该方法稳健、可解释,并能有效识别有应答的亚组。广泛的模拟实验证实了其检测异质性患者反应以及区分具有和不具有异质性的数据集的能力,同时将I类错误率严格控制在1%。我们进一步使用III期随机临床试验数据集验证了其性能,证明了基于基线特征的治疗反应存在显著的患者异质性。