Robustifying State-space Models for Long Sequences via Approximate Diagonalization

State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have considered a purely diagonal structure. This choice simplifies the implementation, improves computational efficiency, and allows channel communication. However, diagonalizing the HiPPO framework is itself an ill-posed problem. In this paper, we propose a general solution for this and related ill-posed diagonalization problems in machine learning. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology, which is based on the pseudospectral theory of non-normal operators, and which may be interpreted as the approximate diagonalization of the non-normal matrices defining SSMs. Based on this, we introduce the S4-PTD and S5-PTD models. Through theoretical analysis of the transfer functions of different initialization schemes, we demonstrate that the S4-PTD/S5-PTD initialization strongly converges to the HiPPO framework, while the S4D/S5 initialization only achieves weak convergences. As a result, our new models show resilience to Fourier-mode noise-perturbed inputs, a crucial property not achieved by the S4D/S5 models. In addition to improved robustness, our S5-PTD model averages 87.6% accuracy on the Long-Range Arena benchmark, demonstrating that the PTD methodology helps to improve the accuracy of deep learning models.

翻译：状态空间模型（SSM）近期成为学习长程序列任务的一种框架。例如结构化状态空间序列（S4）层，其利用HiPPO初始化框架的对角加低秩结构。然而，S4层的复杂结构带来了挑战；为应对这些挑战，S4D和S5等模型采用了纯对角结构。这一选择简化了实现、提高了计算效率，并允许通道间通信。但HiPPO框架的对角化本身是一个病态问题。本文针对这一机器学习中的相关病态对角化问题提出通用解决方案。我们引入一种基于非正规算子赝谱理论的通用后向稳定"扰动-再对角化"（PTD）方法论，可解释为对定义SSM的非正规矩阵进行近似对角化。基于此，我们提出S4-PTD和S5-PTD模型。通过分析不同初始化方案的传递函数，我们证明S4-PTD/S5-PTD初始化强收敛于HiPPO框架，而S4D/S5初始化仅实现弱收敛。因此，我们的新模型对傅里叶模式噪声扰动输入具有鲁棒性，这是S4D/S5模型未能实现的关键特性。除增强鲁棒性外，我们的S5-PTD模型在Long-Range Arena基准测试中平均准确率达87.6%，表明PTD方法论有助于提升深度学习模型的精度。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日