We study multigrade deep learning (MGDL) as a principled framework for structured error refinement in deep neural networks. While the approximation power of neural networks is now relatively well understood, training very deep architectures remains challenging due to highly non-convex and often ill-conditioned optimization landscapes. In contrast, for relatively shallow networks, most notably one-hidden-layer $\texttt{ReLU}$ models, training admits convex reformulations with global guarantees, motivating learning paradigms that improve stability while scaling to depth. MGDL builds upon this insight by training deep networks grade by grade: previously learned grades are frozen, and each new residual block is trained solely to reduce the remaining approximation error, yielding an interpretable and stable hierarchical refinement process. We develop an operator-theoretic foundation for MGDL and prove that, for any continuous target function, there exists a fixed-width multigrade $\texttt{ReLU}$ scheme whose residuals decrease strictly across grades and converge uniformly to zero. To the best of our knowledge, this work provides the first rigorous theoretical guarantee that grade-wise training yields provable vanishing approximation error in deep networks. Numerical experiments further illustrate the theoretical results.
翻译:我们研究多级深度学习(MGDL)作为一种结构化误差精炼框架,用于深度神经网络。尽管神经网络的逼近能力现已相对明确,但由于高度非凸且条件数差的优化地形,训练深层架构仍具挑战性。相比之下,对于较浅的网络,尤其是单隐层$\texttt{ReLU}$模型,训练可转化为具有全局保证的凸优化问题,这启发了一种在扩展深度时提升稳定性的学习范式。MGDL基于此洞见,通过逐级训练深层网络:先前学习的层级被冻结,每个新增残差块仅用于减少剩余逼近误差,从而形成可解释且稳定的分层精炼过程。我们为MGDL建立了算子理论基础,并证明:对任意连续目标函数,存在固定宽度的多级$\texttt{ReLU}$方案,其残差随层级严格递减且一致收敛于零。据我们所知,本研究首次提供了严格的理论保证,证明逐级训练可在深度网络中实现可证明的逼近误差消逝。数值实验进一步验证了理论结果。