Deep equilibrium (DEQ) models are widely recognized as a memory efficient alternative to standard neural networks, achieving state-of-the-art performance in language modeling and computer vision tasks. These models solve a fixed point equation instead of explicitly computing the output, which sets them apart from standard neural networks. However, existing DEQ models often lack formal guarantees of the existence and uniqueness of the fixed point, and the convergence of the numerical scheme used for computing the fixed point is not formally established. As a result, DEQ models are potentially unstable in practice. To address these drawbacks, we introduce a novel class of DEQ models called positive concave deep equilibrium (pcDEQ) models. Our approach, which is based on nonlinear Perron-Frobenius theory, enforces nonnegative weights and activation functions that are concave on the positive orthant. By imposing these constraints, we can easily ensure the existence and uniqueness of the fixed point without relying on additional complex assumptions commonly found in the DEQ literature, such as those based on monotone operator theory in convex analysis. Furthermore, the fixed point can be computed with the standard fixed point algorithm, and we provide theoretical guarantees of its geometric convergence, which, in particular, simplifies the training process. Experiments demonstrate the competitiveness of our pcDEQ models against other implicit models.
翻译:深度均衡(DEQ)模型被广泛认为是标准神经网络的一种内存高效替代方案,在语言建模和计算机视觉任务中实现了最先进的性能。这些模型通过求解不动点方程而非显式计算输出来实现其功能,这一特性使其区别于标准神经网络。然而,现有DEQ模型通常缺乏关于不动点存在性和唯一性的形式化保证,且用于计算不动点的数值方案的收敛性也未得到形式化确立。因此,DEQ模型在实践中可能存在不稳定性。为克服这些缺陷,我们引入了一类新型DEQ模型——正凹深度均衡(pcDEQ)模型。该方法基于非线性Perron-Frobenius理论,强制要求权重非负且在正象限上采用凹激活函数。通过施加这些约束,我们能够在不依赖DEQ文献中常见附加复杂假设(例如基于凸分析中单调算子理论的假设)的情况下,轻松确保不动点的存在性与唯一性。此外,该不动点可通过标准不动点算法进行计算,我们为其几何收敛性提供了理论保证,这尤其简化了训练过程。实验表明,我们的pcDEQ模型与其他隐式模型相比具有竞争优势。