In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution for the quadratic loss function at a linear convergence rate. This paper shows that this fact still holds for DEQs with any generally bounded activation with bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we must create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.
翻译:在最近的一篇论文中,Ling等人研究了采用ReLU激活的过参数化深度均衡模型(DEQ),证明了梯度下降法在二次损失函数下能以线性收敛速率收敛到全局最优解。本文证明,对于具有一阶和二阶导数有界的一般有界激活函数的DEQ,该结论仍然成立。由于新激活函数通常是非齐次的,因此界定均衡点Gram矩阵的最小特征值尤为困难。为实现这一目标,我们需构建新型总体Gram矩阵,并发展基于Hermite多项式展开的新形式对偶激活函数。