Neural ordinary differential equations (neural ODEs) have emerged as a natural tool for supervised learning from a control perspective, yet a complete understanding of their optimal architecture remains elusive. In this work, we examine the interplay between their width $p$ and number of layer transitions $L$ (effectively the depth $L+1$). Specifically, we assess the model expressivity in terms of its capacity to interpolate either a finite dataset $D$ comprising $N$ pairs of points or two probability measures in $\mathbb{R}^d$ within a Wasserstein error margin $\varepsilon>0$. Our findings reveal a balancing trade-off between $p$ and $L$, with $L$ scaling as $O(1+N/p)$ for dataset interpolation, and $L=O\left(1+(p\varepsilon^d)^{-1}\right)$ for measure interpolation. In the autonomous case, where $L=0$, a separate study is required, which we undertake focusing on dataset interpolation. We address the relaxed problem of $\varepsilon$-approximate controllability and establish an error decay of $\varepsilon\sim O(\log(p)p^{-1/d})$. This decay rate is a consequence of applying a universal approximation theorem to a custom-built Lipschitz vector field that interpolates $D$. In the high-dimensional setting, we further demonstrate that $p=O(N)$ neurons are likely sufficient to achieve exact control.
翻译:神经常微分方程(神经ODE)已从控制视角成为监督学习中的自然工具,但其最优架构的完整理解仍不明确。本研究探讨了其宽度$p$与层跃迁次数$L$(即有效深度$L+1$)之间的相互作用。具体而言,我们从模型表达力角度评估其插值能力:对包含$N$个样本对的有限数据集$D$,或在Wasserstein误差边际$\varepsilon>0$下对$\mathbb{R}^d$中两个概率测度进行插值。研究发现$p$与$L$存在权衡关系:在数据集插值中$L$缩放为$O(1+N/p)$,在测度插值中$L=O\left(1+(p\varepsilon^d)^{-1}\right)$。对于$L=0$的自洽情形,需要单独研究,我们聚焦于数据集插值展开讨论。我们解决了$\varepsilon$-近似可控性的松弛问题,并建立了误差衰减$\varepsilon\sim O(\log(p)p^{-1/d})$。该衰减率源于对定制构造的利普希茨向量场应用通用逼近定理以完成$D$的插值。在高维场景下,我们进一步证明$p=O(N)$个神经元可能足以实现精确控制。