In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. This paper shows that this fact still holds for DEQs with any general activation that has bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we need to create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.
翻译:在最近的一篇论文中,Ling等人研究了具有ReLU激活函数的过参数化深度平衡模型(DEQ)。他们证明了对于二次损失函数,梯度下降以线性收敛速率收敛到全局最优解。本文证明了这一结论对于具有任意一般激活函数(其一阶和二阶导数有界)的DEQ仍然成立。由于新的激活函数通常是非齐次的,界定平衡点Gram矩阵最小特征值尤为困难。为完成此任务,我们需要构建一种新颖的总体Gram矩阵,并发展一种具有埃尔米特多项式展开的新型对偶激活函数形式。