In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find the analytical solution of a three-layer network with a matrix exponential activation function, i.e., $$ f(X)=W_3\exp(W_2\exp(W_1X)), X\in \mathbb{C}^{d\times d} $$ have analytical solutions for the equations $$ Y_1=f(X_1),Y_2=f(X_2) $$ for $X_1,X_2,Y_1,Y_2$ with only invertible assumptions. Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,$Y=WX$.
翻译:在实践中,深度网络往往比浅层网络更强大,但这一现象尚未得到理论上的充分理解。本文中,我们找到了具有矩阵指数激活函数的三层网络的解析解,即对于函数 $$ f(X)=W_3\exp(W_2\exp(W_1X)), X\in \mathbb{C}^{d\times d} $$,在仅要求可逆的假设下,针对给定的 $X_1,X_2,Y_1,Y_2$,方程组 $$ Y_1=f(X_1),Y_2=f(X_2) $$ 存在解析解。我们的证明揭示了网络深度的优势以及非线性激活函数的作用,因为单层网络仅能求解单一方程,即 $Y=WX$。