Owing to their inherently interpretable structure, decision trees are commonly used in applications where interpretability is essential. Recent work has focused on improving various aspects of decision trees, including their predictive power and robustness; however, their instability, albeit well-documented, has been addressed to a lesser extent. In this paper, we take a step towards the stabilization of decision tree models through the lens of real-world health care applications due to the relevance of stability and interpretability in this space. We introduce a new distance metric for decision trees and use it to determine a tree's level of stability. We propose a novel methodology to train stable decision trees and investigate the existence of trade-offs that are inherent to decision tree models - including between stability, predictive power, and interpretability. We demonstrate the value of the proposed methodology through an extensive quantitative and qualitative analysis of six case studies from real-world health care applications, and we show that, on average, with a small 4.6% decrease in predictive power, we gain a significant 38% improvement in the model's stability.
翻译:由于其固有的可解释结构,决策树常用于可解释性至关重要的应用场景。近期研究主要聚焦于改进决策树的各方面性能,包括预测能力和鲁棒性;然而,尽管其不稳定性已有充分文献记载,但针对该问题的研究仍相对有限。本文基于医疗健康应用场景,鉴于稳定性与可解释性在该领域的相关性,通过实践视角探索决策树模型的稳定化方法。我们提出一种新的决策树距离度量,并将其用于评估树的稳定性水平。我们开发了一种训练稳定决策树的新颖方法,并探究了决策树模型固有的权衡关系——包括稳定性、预测能力与可解释性之间的平衡。通过对六个真实医疗健康应用案例的定量与定性分析,我们验证了该方法的实用价值。研究表明:在预测能力仅小幅下降4.6%的情况下,模型稳定性可获得38%的显著提升。