Deep learning models have become popular in the analysis of tabular data, as they address the limitations of decision trees and enable valuable applications like semi-supervised learning, online learning, and transfer learning. However, these deep-learning approaches often encounter a trade-off. On one hand, they can be computationally expensive when dealing with large-scale or high-dimensional datasets. On the other hand, they may lack interpretability and may not be suitable for small-scale datasets. In this study, we propose a novel interpretable neural network called Neural Classification and Regression Tree (NCART) to overcome these challenges. NCART is a modified version of Residual Networks that replaces fully-connected layers with multiple differentiable oblivious decision trees. By integrating decision trees into the architecture, NCART maintains its interpretability while benefiting from the end-to-end capabilities of neural networks. The simplicity of the NCART architecture makes it well-suited for datasets of varying sizes and reduces computational costs compared to state-of-the-art deep learning models. Extensive numerical experiments demonstrate the superior performance of NCART compared to existing deep learning models, establishing it as a strong competitor to tree-based models.
翻译:深度学习模型在表格数据分析中日益流行,因其弥补了决策树的局限性,并支持半监督学习、在线学习、迁移学习等重要应用。然而,这些深度学习方法常面临权衡:一方面,处理大规模或高维数据集时计算成本高昂;另一方面,它们可能缺乏可解释性,且不适用于小规模数据集。本研究提出一种名为“神经分类与回归树”(NCART)的新型可解释神经网络,以克服上述挑战。NCART是残差网络的改进版本,用多个可微的无知决策树替代全连接层。通过将决策树集成至架构中,NCART在保持可解释性的同时,继承了神经网络的端到端能力。NCART架构的简洁性使其适用于不同规模的数据集,且相较现有先进深度学习模型降低了计算成本。大量数值实验表明,NCART的性能优于现有深度学习模型,成为基于树的模型的强劲竞争者。