In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation and proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. In this paper, we show that this fact still holds for DEQs with any general activation which has bounded first and second derivatives. Since the new activation function is generally non-linear, a general population Gram matrix is designed, and a new form of dual activation with Hermite polynomial expansion is developed.
翻译:在近期一项研究中,Ling等人探究了采用ReLU激活函数的过参数化深度平衡模型,并证明对于二次损失函数,梯度下降以线性收敛速率收敛至全局最优解。本文进一步证明,该结论对任意一阶与二阶导数有界的一般激活函数的深度平衡模型仍然成立。由于新型激活函数通常具有非线性特征,我们设计了一种通用的总体格拉姆矩阵,并发展了基于埃尔米特多项式展开的新型对偶激活函数形式。