TREE: Tree Regularization for Efficient Execution

The rise of machine learning methods on heavily resource constrained devices requires not only the choice of a suitable model architecture for the target platform, but also the optimization of the chosen model with regard to execution time consumption for inference in order to optimally utilize the available resources. Random forests and decision trees are shown to be a suitable model for such a scenario, since they are not only heavily tunable towards the total model size, but also offer a high potential for optimizing their executions according to the underlying memory architecture. In addition to the straightforward strategy of enforcing shorter paths through decision trees and hence reducing the execution time for inference, hardware-aware implementations can optimize the execution time in an orthogonal manner. One particular hardware-aware optimization is to layout the memory of decision trees in such a way, that higher probably paths are less likely to be evicted from system caches. This works particularly well when splits within tree nodes are uneven and have a high probability to visit one of the child nodes. In this paper, we present a method to reduce path lengths by rewarding uneven probability distributions during the training of decision trees at the cost of a minimal accuracy degradation. Specifically, we regularize the impurity computation of the CART algorithm in order to favor not only low impurity, but also highly asymmetric distributions for the evaluation of split criteria and hence offer a high optimization potential for a memory architecture-aware implementation. We show that especially for binary classification data sets and data sets with many samples, this form of regularization can lead to an reduction of up to approximately four times in the execution time with a minimal accuracy degradation.

翻译：在资源高度受限的设备上部署机器学习方法，不仅需要为目标平台选择合适的模型架构，还需针对推理执行时间对所选模型进行优化，以充分利用可用资源。随机森林与决策树被证明是适合此类场景的模型，因为它们不仅能在整体模型规模上进行深度调优，还具备根据底层内存架构优化执行过程的巨大潜力。除了通过强制缩短决策树路径以降低推理执行时间的直接策略外，硬件感知实现能以正交方式优化执行时间。一种特定的硬件感知优化策略是：通过合理布局决策树的内存结构，使高概率路径更不易被系统缓存驱逐。当树节点内的分割不均匀且访问某一子节点的概率较高时，该方法效果尤为显著。本文提出一种在训练决策树时通过奖励不均匀概率分布来缩短路径长度的方法，该方法仅以极小的精度损失为代价。具体而言，我们对CART算法中的不纯度计算进行正则化，使其在评估分割标准时不仅偏好低不纯度，同时倾向于高度不对称的分布，从而为内存架构感知的实现提供显著的优化潜力。实验表明，特别是在二分类数据集及大样本数据集上，此类正则化可使执行时间降低至约四分之一，而精度损失极小。