Contractivity of neural ODEs: an eigenvalue optimization problem

We propose a novel methodology to solve a key eigenvalue optimization problem which arises in the contractivity analysis of neural ODEs. When looking at contractivity properties of a one layer weight-tied neural ODE $\dot{u}(t)=\sigma(Au(t)+b)$ (with $u,b \in {\mathbb R}^n$, $A$ is a given $n \times n$ matrix, $\sigma : {\mathbb R} \to {\mathbb R}$ denotes an activation function and for a vector $z \in {\mathbb R}^n$, $\sigma(z) \in {\mathbb R}^n$ has to be interpreted entry-wise), we are led to study the logarithmic norm of a set of products of type $D A$, where $D$ is a diagonal matrix such that ${\mathrm{diag}}(D) \in \sigma'({\mathbb R}^n)$. Specifically, given a real number $c$ (usually $c=0$), the problem consists in finding the largest positive interval $\text{I}\subseteq \mathbb [0,\infty)$ such that the logarithmic norm $\mu(DA) \le c$ for all diagonal matrices $D$ with $D_{ii}\in \text{I}$. We propose a two-level nested methodology: an inner level where, for a given $\text{I}$, we compute an optimizer $D^\star(\text{I})$ by a gradient system approach, and an outer level where we tune $\text{I}$ so that the value $c$ is reached by $\mu(D^\star(\text{I})A)$. We extend the proposed two-level approach to the general multilayer, and possibly time-dependent, case $\dot{u}(t) = \sigma( A_k(t) \ldots \sigma ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$ and we propose several numerical examples to illustrate its behaviour, including its stabilizing performance on a one-layer neural ODE applied to the classification of the MNIST handwritten digits dataset.

翻译：我们提出了一种新颖的方法论来解决一个关键的特征值优化问题，该问题产生于神经ODE的收缩性分析中。在研究单层权重共享神经ODE $\dot{u}(t)=\sigma(Au(t)+b)$（其中$u,b \in {\mathbb R}^n$，$A$是给定的$n \times n$矩阵，$\sigma : {\mathbb R} \to {\mathbb R}$表示激活函数，且对于向量$z \in {\mathbb R}^n$，$\sigma(z) \in {\mathbb R}^n$需按分量进行解释）的收缩性质时，我们被引导去研究一类$D A$型乘积矩阵的对数范数，其中$D$是对角矩阵且满足${\mathrm{diag}}(D) \in \sigma'({\mathbb R}^n)$。具体而言，给定一个实数$c$（通常$c=0$），该问题在于寻找最大的正区间$\text{I}\subseteq \mathbb [0,\infty)$，使得对于所有满足$D_{ii}\in \text{I}$的对角矩阵$D$，其对数范数$\mu(DA) \le c$均成立。我们提出了一种双层嵌套方法：内层针对给定的$\text{I}$，通过梯度系统方法计算一个优化器$D^\star(\text{I})$；外层则调整$\text{I}$，使得$\mu(D^\star(\text{I})A)$达到值$c$。我们将所提出的双层方法推广到一般的多层且可能时变的情形$\dot{u}(t) = \sigma( A_k(t) \ldots \sigma ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$，并提供了若干数值示例以说明其性能，包括其在一个应用于MNIST手写数字数据集分类的单层神经ODE上所表现出的稳定化效果。