Positive concave deep equilibrium models

Deep equilibrium (DEQ) models are widely recognized as a memory efficient alternative to standard neural networks, achieving state-of-the-art performance in language modeling and computer vision tasks. These models solve a fixed point equation instead of explicitly computing the output, which sets them apart from standard neural networks. However, existing DEQ models often lack formal guarantees of the existence and uniqueness of the fixed point, and the convergence of the numerical scheme used for computing the fixed point is not formally established. As a result, DEQ models are potentially unstable in practice. To address these drawbacks, we introduce a novel class of DEQ models called positive concave deep equilibrium (pcDEQ) models. Our approach, which is based on nonlinear Perron-Frobenius theory, enforces nonnegative weights and activation functions that are concave on the positive orthant. By imposing these constraints, we can easily ensure the existence and uniqueness of the fixed point without relying on additional complex assumptions commonly found in the DEQ literature, such as those based on monotone operator theory in convex analysis. Furthermore, the fixed point can be computed with the standard fixed point algorithm, and we provide theoretical guarantees of its geometric convergence, which, in particular, simplifies the training process. Experiments demonstrate the competitiveness of our pcDEQ models against other implicit models.

翻译：深度均衡（DEQ）模型被广泛认为是标准神经网络的一种内存高效替代方案，在语言建模和计算机视觉任务中实现了最先进的性能。这些模型通过求解不动点方程而非显式计算输出来实现其功能，这一特性使其区别于标准神经网络。然而，现有DEQ模型通常缺乏关于不动点存在性和唯一性的形式化保证，且用于计算不动点的数值方案的收敛性也未得到形式化确立。因此，DEQ模型在实践中可能存在不稳定性。为克服这些缺陷，我们引入了一类新型DEQ模型——正凹深度均衡（pcDEQ）模型。该方法基于非线性Perron-Frobenius理论，强制要求权重非负且在正象限上采用凹激活函数。通过施加这些约束，我们能够在不依赖DEQ文献中常见附加复杂假设（例如基于凸分析中单调算子理论的假设）的情况下，轻松确保不动点的存在性与唯一性。此外，该不动点可通过标准不动点算法进行计算，我们为其几何收敛性提供了理论保证，这尤其简化了训练过程。实验表明，我们的pcDEQ模型与其他隐式模型相比具有竞争优势。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日