Identifying subgroups of patients who benefit from a treatment is a key aspect of personalized medicine, these subgroups can be used to develop individualized treatment rules (ITRs). Many machine learning methods have been proposed to create such rules. However, to what extent methods lead to the same ITRs, i.e., recommending the same treatment for the same individuals is unclear. To see if methods lead to similar ITRs, we compared the most common approaches in two randomized control trials. Two classes of methods can be distinguished to develop an ITR. The first class of methods relies on predicting individualized treatment effects from which an ITR is derived by recommending the evaluated treatment to the individuals with a predicted benefit. In the second class, methods directly estimate the ITR without estimating individualized treatment effects. For each trial, the performance of ITRs was assessed with various metrics, and the pairwise agreement between ITRs was also calculated. Results showed that the ITRs obtained by the different methods generally had considerable disagreements regarding the individuals to be treated. A better concordance was found among akin methods. Overall, when evaluating the performance of ITRs in a validation sample, all methods produced ITRs with limited performance, suggesting a high potential for overfitting. The different methods do not lead to similar ITRs and are therefore not interchangeable. The choice of the method has a lot of influence on which patients end up being given a certain treatment which draws some concerns about the practical use of the methods.
翻译:识别从治疗中获益的患者亚组是个性化医学的关键方面,这些亚组可用于制定个体化治疗规则(ITR)。目前已提出多种机器学习方法来创建此类规则。然而,这些方法在多大程度上会产生相同的ITR(即为相同个体推荐相同治疗)尚不明确。为探究不同方法是否产生相似的ITR,我们在两项随机对照试验中比较了最常用的方法。制定ITR的方法可分为两类:第一类方法依赖预测个体化治疗效果,通过向预测获益的个体推荐评估治疗来推导ITR;第二类方法则直接估计ITR,无需估算个体化治疗效果。在每项试验中,我们使用多种指标评估ITR的性能,并计算了ITR之间的两两一致率。结果表明,不同方法获得的ITR在待治疗个体方面通常存在显著分歧,而同类方法间的一致性较好。总体而言,在验证样本中评估ITR性能时,所有方法产生的ITR性能均有限,表明存在较高的过拟合可能性。不同方法不会产生相似的ITR,因此不可互换。方法选择对最终接受特定治疗的患者群体具有重大影响,这引发了对这些方法实际应用价值的担忧。