To investigate the theoretical foundations of deep learning from the viewpoint of the minimum description length (MDL) principle, we analyse risk bounds of MDL estimators based on two-stage codes for simple two-layers neural networks (NNs) with ReLU activation. For that purpose, we propose a method to design two-stage codes for linear regression models and establish an upper bound on the risk of the corresponding MDL estimators based on the theory on MDL estimators originated by Barron and Cover (1991). Then, we apply this result to the simple two-layers NNs with ReLU activation which consist of $d$ nodes in the input layer, $m$ nodes in the hidden layer and one output node. Since the object of estimation is only the $m$ weights from the hidden layer to the output node in our setting, this is an example of linear regression models. As a result, we show that the redundancy of the obtained two-stage codes is small owing to the fact that the eigenvalue distribution of the Fisher information matrix of the NNs is strongly biased, which was recently shown by Takeishi et al. (2023). That is, we establish a tight upper bound on the risk of our MDL estimators. Note that our risk bound, of which the leading term is $O(d^2 \log n /n)$, is independent of the number of parameters $m$.
翻译:为从最小描述长度(MDL)原理的视角探究深度学习的理论基础,本文分析了基于两阶段编码的MDL估计量在具有ReLU激活函数的简单两层神经网络中的风险界。为此,我们提出了一种为线性回归模型设计两阶段编码的方法,并基于Barron与Cover(1991)建立的MDL估计量理论,确立了相应MDL估计量的风险上界。随后,我们将此结果应用于具有ReLU激活函数的简单两层神经网络,该网络包含输入层$d$个节点、隐藏层$m$个节点以及一个输出节点。由于在我们的设定中估计对象仅为隐藏层至输出层的$m$个权重,这构成了线性回归模型的一个实例。结果表明,得益于Takeishi等人(2023)近期揭示的神经网络Fisher信息矩阵特征值分布存在强偏置性,所得两阶段编码的冗余度较小。由此,我们建立了所提MDL估计量的紧致风险上界。需特别指出的是,我们的风险界(其主项为$O(d^2 \log n /n)$)与参数数量$m$无关。