Oblique decision trees combine the transparency of trees with the power of multivariate decision boundaries, but learning high-quality oblique splits is NP-hard, and practical methods still rely on slow search or theory-free heuristics. We present the Hinge Regression Tree (HRT), which reframes each split as a non-linear least-squares problem over two linear predictors whose max/min envelope induces ReLU-like expressive power. The resulting alternating fitting procedure is exactly equivalent to a damped Newton (Gauss-Newton) method within fixed partitions. We analyze this node-level optimization and, for a backtracking line-search variant, prove that the local objective decreases monotonically and converges; in practice, both fixed and adaptive damping yield fast, stable convergence and can be combined with optional ridge regularization. We further prove that HRT's model class is a universal approximator with an explicit $O(δ^2)$ approximation rate, and show on synthetic and real-world benchmarks that it matches or outperforms single-tree baselines with more compact structures.
翻译:斜决策树结合了树的透明性与多变量决策边界的强大能力,但学习高质量的斜分裂是NP难问题,现有实用方法仍依赖缓慢搜索或无理论依据的启发式方法。我们提出铰链回归树(HRT),将每次分裂重构为关于两个线性预测器的非线性最小二乘问题,其最大/最小包络函数赋予类似ReLU的表达能力。由此产生的交替拟合过程在固定划分内恰好等价于阻尼牛顿(高斯-牛顿)方法。我们分析该节点级优化,对于回溯线搜索变体,证明局部目标函数单调递减并收敛;在实践中,固定阻尼与自适应阻尼均能实现快速稳定的收敛,且可结合可选的岭正则化。进一步证明HRT模型类具有通用逼近性,逼近率为显式的$O(δ^2)$。合成数据集与实际基准测试表明,HRT能以更紧凑结构匹配或超越单棵树的基线方法。