In the era of Explainable Artificial Intelligence, there is a renewed focus on single trees for their ease of interpretation. This paper introduces Simultaneous Latent Budget Trees, a probabilistic machine learning framework for classification trees in the presence of a stratification factor such as a temporal, spatial, or demographic variable, acting as a control variable or potential confounder. Standard tree growth procedures are not designed to optimize a conditional split rule. A model-based split rule is proposed in which child nodes are interpreted as latent components of a simultaneous mixture model, such as the Simultaneous Latent Budget Model and its constrained versions, fitted to the parent node. Mixing parameters drive the observations, differently for each group, to the child nodes whereas latent budgets parameters update the response classes profile of each level of the control variable. Parameters are estimated by least squares considering a neural network perspective of the model. An informative tree structure can be interactively visualized with interpretation aids on the node and the paths, including visual pruning and decision tree selection procedure. Suitable measures are proposed to handle an unbalanced response class distribution. The proposed methodology is applied to investigate gender-related differences in disease progression of Amyotrophic Lateral Sclerosis. The SLBT library with the various tree-based algorithms is available in the linked GitHub repository.
翻译:在可解释人工智能时代,单棵树因其易于解释的特性而重新受到关注。本文提出了同时潜在预算树,这是一种概率机器学习框架,用于处理存在分层因子(如时间、空间或人口统计学变量)作为控制变量或潜在混杂因素时的分类树。标准树生长过程并非为优化条件分裂规则而设计。本文提出了一种基于模型的分裂规则,其中子节点被解释为同时混合模型的潜在成分,例如同时潜在预算模型及其约束版本,并拟合于父节点。混合参数驱动不同组的观测值以不同方式进入子节点,而潜在预算参数则更新控制变量各水平对应的响应类别分布。参数通过最小二乘法估计,同时考虑了模型的神经网络视角。交互式可视化的信息树结构配备节点和路径的解释辅助工具,包括视觉剪枝和决策树选择程序。针对不平衡响应类别分布,提出了相应的度量方法。该方法被应用于研究肌萎缩侧索硬化症疾病进展中的性别差异。包含各种基于树算法的SLBT库可在关联的GitHub仓库中获取。