This paper introduces the centroid decision forest (CDF), a novel ensemble learning framework that redefines the splitting strategy and tree building in the ordinary decision trees for high-dimensional classification. The splitting approach in CDF differs from the traditional decision trees in theat the class separability score (CSS) determines the selection of the most discriminative features at each node to construct centroids of the partitions (daughter nodes). The splitting criterion uses the Euclidean distance measurements from each class centroid to achieve a splitting mechanism that is more flexible and robust. Centroids are constructed by computing the mean feature values of the selected features for each class, ensuring a class-representative division of the feature space. This centroid-driven approach enables CDF to capture complex class structures while maintaining interpretability and scalability. To evaluate CDF, 23 high-dimensional datasets are used to assess its performance against different state-of-the-art classifiers through classification accuracy and Cohen's kappa statistic. The experimental results show that CDF outperforms the conventional methods establishing its effectiveness and flexibility for high-dimensional classification problems.
翻译:本文提出了一种新颖的集成学习框架——质心决策森林,该框架重新定义了普通决策树在高维分类中的分裂策略与树构建过程。CDF的分裂方法与传统决策树不同之处在于:其通过类别可分性评分来决定每个节点上最具判别性特征的选择,以构建分区(子节点)的质心。该分裂准则利用每个类别质心的欧几里得距离度量,实现了一种更为灵活且鲁棒的分裂机制。质心通过计算每个类别在所选特征上的特征均值来构建,从而确保特征空间按类别代表性进行划分。这种以质心驱动的方法使CDF能够捕捉复杂的类别结构,同时保持可解释性与可扩展性。为评估CDF性能,本研究采用23个高维数据集,通过分类准确率与Cohen's kappa统计量,将其与多种先进分类器进行对比。实验结果表明,CDF在分类性能上优于传统方法,证实了其在高维分类问题中的有效性与灵活性。