A General Framework for Decision Trees via Bregman Divergences

Decision trees are one of the fundamental tools in statistical learning due to their interpretability, flexibility, and their ability to adapt to nonlinear structures. Among them, the Classification and Regression Trees, introduced by Breiman, Friedman, Olshen, and Stone in 1984, became one of the most influential algorithms and remains one of the most widely used methods for classification and regression problems. On the other hand, Bregman divergences, introduced by Lev Bregman in 1967 in the context of convex optimization, provide a broad family of loss functions that naturally generalize the squared Euclidean distance. This family includes, among others, the Kullback-Leibler divergence, the Poisson divergence, and the Itakura-Saito divergence, as well as several losses associated with distributions belonging to the exponential family. Moreover, Bregman divergences possess a rich geometric structure and deep connections with convex analysis and information geometry. In this work, we propose a generalization of the CART paradigm based on Bregman divergences, thereby obtaining a broader family of decision trees adapted to different statistical models and underlying geometries. Although algorithms such as CART or classical implementations such as rpart incorporate different impurity criteria, these are usually introduced in an ad hoc manner for each specific model. In contrast, the Bregman divergence approach provides a unified framework that allows these criteria to be derived and interpreted from common convex and geometric principles. Beyond the algorithmic construction, we also investigate theoretical properties of these trees. In particular, we study how properties of the generating convex function -- such as strong convexity or smoothness -- influence impurity gains between parent and child nodes, as well as stability and consistency properties of the estimator.

翻译：决策树因其可解释性、灵活性及对非线性结构的适应能力，成为统计学习中的基本工具之一。其中，由Breiman、Friedman、Olshen与Stone于1984年提出的分类回归树（Classification and Regression Trees, CART），是最具影响力的算法之一，至今仍是分类与回归问题中最常用的方法之一。另一方面，Lev Bregman于1967年在凸优化领域提出的布雷格曼散度（Bregman divergences），为损失函数提供了广泛框架，自然地推广了平方欧氏距离。该族函数包括Kullback-Leibler散度、泊松散度、Itakura-Saito散度，以及与指数族分布相关的多种损失函数。此外，布雷格曼散度具有丰富的几何结构，与凸分析及信息几何存在深刻联系。本文提出基于布雷格曼散度的CART范式推广，从而获得适应不同统计模型与底层几何结构的更广泛的决策树族。尽管CART等算法或rpart等经典实现已引入不同不纯度准则，但这些准则通常以特定模型的特设方式引入。相比之下，布雷格曼散度方法提供了统一框架，使得这些准则能从共同的凸性与几何原理中推导与诠释。除算法构建外，本文还研究了这些树的理论性质，特别考察了生成凸函数性质（如强凸性或光滑性）如何影响父子节点间的不纯度增益，以及估计量的稳定性与一致性性质。