Generalization theory has been established for sparse deep neural networks under high-dimensional regime. Beyond generalization, parameter estimation is also important since it is crucial for variable selection and interpretability of deep neural networks. Current theoretical studies concerning parameter estimation mainly focus on two-layer neural networks, which is due to the fact that the convergence of parameter estimation heavily relies on the regularity of the Hessian matrix, while the Hessian matrix of deep neural networks is highly singular. To avoid the unidentifiability of deep neural networks in parameter estimation, we propose to conduct nonparametric estimation of partial derivatives with respect to inputs. We first show that model convergence of sparse deep neural networks is guaranteed in that the sample complexity only grows with the logarithm of the number of parameters or the input dimension when the $\ell_{1}$-norm of parameters is well constrained. Then by bounding the norm and the divergence of partial derivatives, we establish that the convergence rate of nonparametric estimation of partial derivatives scales as $\mathcal{O}(n^{-1/4})$, a rate which is slower than the model convergence rate $\mathcal{O}(n^{-1/2})$. To the best of our knowledge, this study combines nonparametric estimation and parametric sparse deep neural networks for the first time. As nonparametric estimation of partial derivatives is of great significance for nonlinear variable selection, the current results show the promising future for the interpretability of deep neural networks.
翻译:稀疏深度神经网络在高维体系下的泛化理论已经建立。除了泛化性,参数估计同样重要,因为它对深度神经网络的变量选择与可解释性至关重要。当前关于参数估计的理论研究主要集中于两层神经网络,这是因为参数估计的收敛性严重依赖于海森矩阵的正则性,而深度神经网络的海森矩阵具有高度奇异性。为避免深度神经网络在参数估计中的不可识别性问题,我们提出对输入变量的偏导数进行非参数估计。我们首先证明,当参数的$\ell_{1}$-范数受到良好约束时,稀疏深度神经网络的模型收敛性得以保证,其样本复杂度仅随参数数量或输入维度的对数增长。随后通过界定偏导数的范数与散度,我们建立了偏导数非参数估计的收敛速度为$\mathcal{O}(n^{-1/4})$,该速度慢于模型收敛速度$\mathcal{O}(n^{-1/2})$。据我们所知,本研究首次将非参数估计与参数化稀疏深度神经网络相结合。由于偏导数的非参数估计对非线性变量选择具有重要意义,当前研究结果展现了深度神经网络可解释性研究的广阔前景。