Decision trees are widely used in machine learning due to their simplicity in construction and interpretability. However, as data sizes grow, traditional methods for constructing and retraining decision trees become increasingly slow, scaling polynomially with the number of training examples. In this work, we introduce a novel quantum algorithm, named Des-q, for constructing and retraining decision trees in regression and binary classification tasks. Assuming the data stream produces small increments of new training examples, we demonstrate that our Des-q algorithm significantly reduces the time required for tree retraining, achieving a poly-logarithmic time complexity in the number of training examples, even accounting for the time needed to load the new examples into quantum-accessible memory. Our approach involves building a decision tree algorithm to perform k-piecewise linear tree splits at each internal node. These splits simultaneously generate multiple hyperplanes, dividing the feature space into k distinct regions. To determine the k suitable anchor points for these splits, we develop an efficient quantum-supervised clustering method, building upon the q-means algorithm of Kerenidis et al. Des-q first efficiently estimates each feature weight using a novel quantum technique to estimate the Pearson correlation. Subsequently, we employ weighted distance estimation to cluster the training examples in k disjoint regions and then proceed to expand the tree using the same procedure. We benchmark the performance of the simulated version of our algorithm against the state-of-the-art classical decision tree for regression and binary classification on multiple data sets with numerical features. Further, we showcase that the proposed algorithm exhibits similar performance to the state-of-the-art decision tree while significantly speeding up the periodic tree retraining.
翻译:决策树因其构造简单和可解释性强而被广泛用于机器学习。然而,随着数据规模增长,传统决策树构建与再训练方法的速度逐渐变慢,其时间复杂度随训练样本数量呈多项式增长。本文提出一种新型量子算法Des-q,用于回归和二分类任务中决策树的构建与再训练。在假设数据流中新增训练样本增量较小的前提下,我们证明Des-q算法能显著降低树再训练所需时间,即使计入将新样本加载至量子可访问内存的时间开销,其时间复杂度在训练样本数量上仍达到多对数级别。我们的方法通过构建决策树算法,在每个内部节点执行k段线性分裂。这些分裂同时生成多个超平面,将特征空间划分为k个不同区域。为确定这些分裂所需的k个合适锚点,我们基于Kerenidis等人的q-means算法,开发了一种高效的量子监督聚类方法。Des-q首先利用新型量子技术估计皮尔逊相关系数,高效计算各特征权重;随后通过加权距离估计将训练样本聚类至k个不相交区域,并采用相同流程扩展决策树。我们针对多个包含数值特征的数据集,将本算法仿真版本与经典决策树在回归和二分类任务上的性能进行基准测试。结果表明,所提算法在保持与经典决策树相似性能的同时,显著加速了周期性树再训练过程。