In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. This paper shows that this fact still holds for DEQs with any general activation that has bounded first and second derivatives. Since the new activation function is generally non-linear, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we need to create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.
翻译:在近期一篇论文中,Ling等人研究了采用ReLU激活函数的过参数化深度平衡模型(DEQ),并证明梯度下降在二次损失函数下能以线性收敛速度收敛至全局最优解。本文表明,这一结论对于采用任意一阶和二阶导数有界的一般激活函数的DEQ模型仍然成立。由于新型激活函数通常具有非线性特征,对平衡点Gram矩阵最小特征值的界定尤为困难。为此,我们需要构建新型群体Gram矩阵,并发展基于Hermite多项式展开的对偶激活函数新形式。