Survival analysis concerns the task of predicting the time until an event occurs. Often used in the medical field, survival analysis deals with incomplete (i.e., censored) data, for instance, from patients who did not experience the event during the duration of the study. For practical use, both accuracy and interpretability are important. Survival trees are easy-to-follow survival models that split the patient cohort recursively into discrete patient groups. Whilst survival trees can capture complex relationships, they typically need to grow large, threatening interpretability. Moreover, survival trees are often built using greedy approaches that may overlook globally optimal split combinations, limiting predictive performance. Shallow survival trees require expressive, higher-order feature combinations to achieve competitive accuracy. We therefore use genetic programming to multi-objectively evolve inherently inspectable feature sets and study how they interact with different tree induction strategies. We further introduce an evolutionary approach that jointly optimises the survival tree structure and the non-linear split logic. Our findings demonstrate that evolutionary feature construction improves predictive performance across different tree induction strategies on two real-world datasets and two different survival tree depths. Full joint evolution has the overall highest potential to propose multiple inherently inspectable shallow survival trees of good performance.
翻译:生存分析涉及预测事件发生时间这一任务。该方法常用于医学领域,处理不完整(即删失)数据,例如在研究期间未经历事件的患者数据。实际应用中,准确性和可解释性同等重要。生存树作为易于理解的生存模型,能够递归地将患者队列划分为离散的患者组。尽管生存树能捕捉复杂关系,但通常需要生长得很大,这威胁到可解释性。此外,生存树通常采用贪心方法构建,可能忽略全局最优的分裂组合,从而限制了预测性能。浅层生存树需要具有表达力的高阶特征组合才能达到具有竞争力的准确性。因此,我们使用遗传规划多目标地演化本质上可检查的特征集,并研究它们如何与不同的树归纳策略相互作用。我们进一步引入了一种联合优化的进化方法,共同优化生存树结构与非线性的分裂逻辑。研究结果表明,在两种真实世界数据集和两种不同生存树深度下,进化特征构建能提升不同树归纳策略的预测性能。完整的联合进化在生成多个性能良好且本质上可检查的浅层生存树方面具有最高潜力。