We propose a novel methodology to solve a key eigenvalue optimization problem which arises in the contractivity analysis of neural ODEs. When looking at contractivity properties of a one layer weight-tied neural ODE $\dot{u}(t)=\sigma(Au(t)+b)$ (with $u,b \in {\mathbb R}^n$, $A$ is a given $n \times n$ matrix, $\sigma : {\mathbb R} \to {\mathbb R}^+$ denotes an activation function and for a vector $z \in {\mathbb R}^n$, $\sigma(z) \in {\mathbb R}^n$ has to be interpreted entry-wise), we are led to study the logarithmic norm of a set of products of type $D A$, where $D$ is a diagonal matrix such that ${\mathrm{diag}}(D) \in \sigma'({\mathbb R}^n)$. Specifically, given a real number $c$ (usually $c=0$), the problem consists in finding the largest positive interval $\chi\subseteq \mathbb [0,\infty)$ such that the logarithmic norm $\mu(DA) \le c$ for all diagonal matrices $D$ with $D_{ii}\in \chi$. We propose a two-level nested methodology: an inner level where, for a given $\chi$, we compute an optimizer $D^\star(\chi)$ by a gradient system approach, and an outer level where we tune $\chi$ so that the value $c$ is reached by $\mu(D^\star(\chi)A)$. We extend the proposed two-level approach to the general multilayer, and possibly time-dependent, case $\dot{u}(t) = \sigma( A_k(t) \ldots \sigma ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$ and we propose several numerical examples to illustrate its behaviour, including its stabilizing performance on a one-layer neural ODE applied to the classification of the MNIST handwritten digits dataset.
翻译:我们提出了一种新颖的方法论,用于解决神经ODE收缩性分析中一个关键的特征值优化问题。当研究单层权重耦合神经ODE $\dot{u}(t)=\sigma(Au(t)+b)$(其中 $u,b \in {\mathbb R}^n$,$A$ 是给定的 $n \times n$ 矩阵,$\sigma : {\mathbb R} \to {\mathbb R}^+$ 表示激活函数,对于向量 $z \in {\mathbb R}^n$,$\sigma(z) \in {\mathbb R}^n$ 需按逐元素理解)的收缩性性质时,我们需研究形如 $D A$ 的乘积集合的对数范数,其中 $D$ 是对角矩阵且满足 ${\mathrm{diag}}(D) \in \sigma'({\mathbb R}^n)$。具体而言,给定实数 $c$(通常 $c=0$),问题在于寻找最大的正区间 $\chi\subseteq [0,\infty)$,使得对所有满足 $D_{ii}\in \chi$ 的对角矩阵 $D$,均有对数范数 $\mu(DA) \le c$。我们提出了一种双层嵌套方法:内层针对给定的 $\chi$,采用梯度系统法计算优化器 $D^\star(\chi)$;外层则调整 $\chi$,使得 $D^\star(\chi)A$ 的对数范数 $\mu(D^\star(\chi)A)$ 达到目标值 $c$。我们将该双层方法推广至一般多层且可能时变的情形 $\dot{u}(t) = \sigma( A_k(t) \ldots \sigma ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$,并通过多个数值算例展示其表现,包括在应用于MNIST手写数字数据集分类的单层神经ODE中展现的稳定化性能。