We study multigrade deep learning (MGDL) as a principled framework for structured error refinement in deep neural networks. While the approximation power of neural networks is now relatively well understood, training very deep architectures remains challenging due to highly nonconvex and often ill-conditioned optimization landscapes. In contrast, for relatively shallow networks, most notably certain one-hidden-layer ReLU models, training admits convex reformulations with global guarantees under appropriate settings, motivating learning paradigms that improve stability while scaling to depth. MGDL builds on this insight by training deep networks grade by grade: previously learned grades are frozen, and each newly added grade-wise subnetwork is composed on top of the previously learned grades and trained to fit the residual left by the current approximation, yielding a structured and interpretable hierarchical refinement process. We develop an operator-theoretic foundation for MGDL and prove that, for any continuous target function defined on a hypercube, there exists a fixed-width multigrade ReLU scheme whose residuals are pointwise nonincreasing in magnitude and converge uniformly to zero, with strict $L^p$-norm decay at every nontrivial grade for $p\in [1,\infty)$. To the best of our knowledge, this work provides the first rigorous constructive approximation guarantee showing that a grade-wise residual refinement scheme can achieve vanishing error in a fixed-width multigrade ReLU architecture.
翻译:我们研究多级深度学习(MGDL)作为深度神经网络中结构化误差精炼的原理性框架。尽管神经网络的逼近能力现已较为明确,但由于高度非凸且常呈病态的优化景观,训练极深架构仍具挑战性。相比之下,对于相对浅层网络(尤其是某些单隐层ReLU模型),在适当条件下其训练可转化为具有全局保证的凸重构形式,这启发了在扩展深度时提升稳定性的学习范式。MGDL基于这一洞见,通过逐级训练深度网络:先前学习的层级被冻结,每一新添加的逐级子网络叠加在已学层级之上进行训练,以拟合当前逼近的残差,从而形成结构清晰且可解释的层级精炼过程。我们为MGDL建立了算子理论基础,并证明:对于定义在超立方体上的任意连续目标函数,存在一种固定宽度的多级ReLU方案,其残差幅度逐点非增并一致收敛至零,且对于所有 $p\in [1,\infty)$,每个非平凡层级均呈现严格 $L^p$范数衰减。据我们所知,本研究首次提供了严格的构造性逼近保证,表明在固定宽度的多级ReLU架构中,逐级残差精炼方案可实现误差消散。