Deep equilibrium (DEQ) models are widely recognized as a memory efficient alternative to standard neural networks, achieving state-of-the-art performance in language modeling and computer vision tasks. These models solve a fixed point equation instead of explicitly computing the output, which sets them apart from standard neural networks. However, existing DEQ models often lack formal guarantees of the existence and uniqueness of the fixed point, and the convergence of the numerical scheme used for computing the fixed point is not formally established. As a result, DEQ models are potentially unstable in practice. To address these drawbacks, we introduce a novel class of DEQ models called positive concave deep equilibrium (pcDEQ) models. Our approach, which is based on nonlinear Perron-Frobenius theory, enforces nonnegative weights and activation functions that are concave on the positive orthant. By imposing these constraints, we can easily ensure the existence and uniqueness of the fixed point without relying on additional complex assumptions commonly found in the DEQ literature, such as those based on monotone operator theory in convex analysis. Furthermore, the fixed point can be computed with the standard fixed point algorithm, and we provide theoretical guarantees of geometric convergence, which, in particular, simplifies the training process. Experiments demonstrate the competitiveness of our pcDEQ models against other implicit models.
翻译:深度均衡(DEQ)模型被广泛认为是标准神经网络的一种内存高效替代方案,在语言建模和计算机视觉任务中取得了最先进的性能。这些模型通过求解不动点方程而非显式计算输出来运作,这使其区别于标准神经网络。然而,现有DEQ模型通常缺乏不动点存在唯一性的形式保证,且用于计算不动点的数值格式的收敛性也未得到正式确立。因此,DEQ模型在实践中可能存在不稳定性。为解决这些缺陷,我们提出了一类新型DEQ模型——正凹深度均衡(pcDEQ)模型。我们的方法基于非线性Perron-Frobenius理论,强制采用非负权重以及在正象限上为凹的激活函数。通过施加这些约束,我们能够轻松确保不动点的存在唯一性,而无需依赖DEQ文献中常见的额外复杂假设(例如基于凸分析中单调算子理论的假设)。此外,该不动点可使用标准不动点算法进行计算,并且我们提供了几何收敛的理论保证,这尤其简化了训练过程。实验表明,我们的pcDEQ模型在与其他隐式模型的对比中具有竞争力。