Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}^d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilon^d)^{-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p^{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
翻译:神经常微分方程(神经ODE)已从控制视角成为监督学习的自然工具,但其最优架构仍未被完全理解。本文研究了其宽度$p$与层转换次数$L$(即有效深度$L+1$)之间的交互关系。具体而言,我们通过模型插值能力评估其表达性:插值对象包括包含$N$个数据点的有限数据集$D$,或在Wasserstein误差阈值$\varepsilon>0$下插值$\mathbb{R}^d$空间中的两个概率测度。研究结果揭示了$p$与$L$之间的平衡折中关系:数据集插值中$L$的标度为$O(1+N/p)$,而概率测度插值中$L=O\left(1+(p\varepsilon^d)^{-1}\right)$。对于$L=0$的自治情形,需单独研究,本文针对数据集插值展开工作。我们研究了$\varepsilon$近似可控性的松弛问题,并建立了误差衰减率$\varepsilon\sim O(\log(p)p^{-1/d})$。该衰减率源于将通用逼近定理应用于自定义的Lipschitz向量场以插值$D$。在高维场景下,我们进一步证明$p=O(N)$个神经元可能足以实现精确控制。