Inferring causal relationships in the decision-making processes of machine learning algorithms is a crucial step toward achieving explainable Artificial Intelligence (AI). In this research, we introduce a novel causality measure and a distance metric derived from Lempel-Ziv (LZ) complexity. We explore how the proposed causality measure can be used in decision trees by enabling splits based on features that most strongly \textit{cause} the outcome. We further evaluate the effectiveness of the causality-based decision tree and the distance-based decision tree in comparison to a traditional decision tree using Gini impurity. While the proposed methods demonstrate comparable classification performance overall, the causality-based decision tree significantly outperforms both the distance-based decision tree and the Gini-based decision tree on datasets generated from causal models. This result indicates that the proposed approach can capture insights beyond those of classical decision trees, especially in causally structured data. Based on the features used in the LZ causal measure based decision tree, we introduce a causal strength for each features in the dataset so as to infer the predominant causal variables for the occurrence of the outcome.
翻译:推断机器学习算法决策过程中的因果关系是实现可解释人工智能(AI)的关键步骤。本研究提出了一种基于Lempel-Ziv(LZ)复杂度推导的新型因果度量指标和距离度量方法。我们探讨了如何通过允许基于最能导致结果的特征进行分割,将所提出的因果度量应用于决策树中。我们进一步评估了基于因果关系的决策树和基于距离的决策树与传统使用基尼不纯度的决策树相比的有效性。虽然所提出的方法在整体分类性能上表现相当,但在由因果模型生成的数据集上,基于因果关系的决策树显著优于基于距离的决策树和基于基尼的决策树。这一结果表明,所提出的方法能够捕捉到经典决策树之外的洞见,尤其是在具有因果结构的数据中。基于LZ因果度量决策树所使用的特征,我们为数据集中的每个特征引入了一个因果强度,以推断导致结果发生的主要因果变量。