We study multigrade deep learning (MGDL) as a principled framework for structured error refinement in deep neural networks. While the approximation power of neural networks is now relatively well understood, training very deep architectures remains challenging due to highly non-convex and often ill-conditioned optimization landscapes. In contrast, for relatively shallow networks, most notably one-hidden-layer $\texttt{ReLU}$ models, training admits convex reformulations with global guarantees, motivating learning paradigms that improve stability while scaling to depth. MGDL builds upon this insight by training deep networks grade by grade: previously learned grades are frozen, and each new residual block is trained solely to reduce the remaining approximation error, yielding an interpretable and stable hierarchical refinement process. We develop an operator-theoretic foundation for MGDL and prove that, for any continuous target function, there exists a fixed-width multigrade $\texttt{ReLU}$ scheme whose residuals decrease strictly across grades and converge uniformly to zero. To the best of our knowledge, this work provides the first rigorous theoretical guarantee that grade-wise training yields provable vanishing approximation error in deep networks. Numerical experiments further illustrate the theoretical results.
翻译:本文研究了多层级深度学习(MGDL)作为深度神经网络中结构化误差修正的原理性框架。尽管神经网络的逼近能力目前已得到较好理解,但由于高度非凸且常呈病态的优化景观,训练极深架构仍然具有挑战性。相比之下,对于相对浅层的网络(最显著的是单隐层$\texttt{ReLU}$模型),训练可通过凸重构获得全局保证,这推动了既能提升稳定性又能扩展至深层的学习范式。MGDL基于这一洞见,通过逐层级训练深度网络:已习得的层级被冻结,每个新的残差块仅针对减少剩余逼近误差进行训练,从而产生可解释且稳定的层次化修正过程。我们为MGDL建立了算子理论基础,并证明对于任意连续目标函数,都存在固定宽度的多层级$\texttt{ReLU}$方案,其残差在层级间严格递减并一致收敛至零。据我们所知,这项工作首次为层级式训练能在深度网络中产生可证明的渐近逼近误差提供了严格的理论保证。数值实验进一步验证了理论结果。