We study when geometric simplicity of decision boundaries, used here as a notion of interpretability, can conflict with accurate approximation of axis-aligned decision trees by shallow neural networks. Decision trees induce rule-based, axis-aligned decision regions (finite unions of boxes), whereas shallow ReLU networks are typically trained as score models whose predictions are obtained by thresholding. We analyze the infinite-width, bounded-norm, single-hidden-layer ReLU class through the Radon total variation ($\mathrm{R}\mathrm{TV}$) seminorm, which controls the geometric complexity of level sets. We first show that the hard tree indicator $1_A$ has infinite $\mathrm{R}\mathrm{TV}$. Moreover, two natural split-wise continuous surrogates--piecewise-linear ramp smoothing and sigmoidal (logistic) smoothing--also have infinite $\mathrm{R}\mathrm{TV}$ in dimensions $d>1$, while Gaussian convolution yields finite $\mathrm{R}\mathrm{TV}$ but with an explicit exponential dependence on $d$. We then separate two goals that are often conflated: classification after thresholding (recovering the decision set) versus score learning (learning a calibrated score close to $1_A$). For classification, we construct a smooth barrier score $S_A$ with finite $\mathrm{R}\mathrm{TV}$ whose fixed threshold $τ=1$ exactly recovers the box. Under a mild tube-mass condition near $\partial A$, we prove an $L_1(P)$ calibration bound that decays polynomially in a sharpness parameter, along with an explicit $\mathrm{R}\mathrm{TV}$ upper bound in terms of face measures. Experiments on synthetic unions of rectangles illustrate the resulting accuracy--complexity tradeoff and how threshold selection shifts where training lands along it.
翻译:本研究探讨了决策边界几何简单性(此处作为可解释性的一种度量)何时会与浅层神经网络对轴对齐决策树的精确逼近相冲突。决策树产生基于规则的轴对齐决策区域(盒状区域的有限并集),而浅层ReLU网络通常作为评分模型进行训练,其预测通过阈值化获得。我们通过Radon全变差($\mathrm{R}\mathrm{TV}$)半范数分析无限宽度、有界范数、单隐藏层ReLU函数类,该半范数控制着水平集的几何复杂度。我们首先证明硬树指示函数$1_A$具有无限$\mathrm{R}\mathrm{TV}$。此外,两种自然的分段连续替代函数——分段线性斜坡平滑与S型(逻辑)平滑——在维度$d>1$时同样具有无限$\mathrm{R}\mathrm{TV}$,而高斯卷积虽能产生有限$\mathrm{R}\mathrm{TV}$,却呈现出关于$d$的显式指数依赖关系。随后我们区分了常被混淆的两个目标:阈值化后的分类(恢复决策集)与评分学习(学习接近$1_A$的校准评分)。针对分类任务,我们构建了一个具有有限$\mathrm{R}\mathrm{TV}$的平滑屏障评分函数$S_A$,其固定阈值$τ=1$能精确恢复盒状区域。在边界$\partial A$附近满足温和管状质量条件时,我们证明了一个关于锐度参数多项式衰减的$L_1(P)$校准界,并给出了基于面测度的显式$\mathrm{R}\mathrm{TV}$上界。在合成矩形并集数据上的实验,揭示了由此产生的精度-复杂度权衡规律,以及阈值选择如何影响训练在该权衡曲线上的定位。