Neural networks are the state-of-the-art for many approximation tasks in high-dimensional spaces, as supported by an abundance of experimental evidence. However, we still need a solid theoretical understanding of what they can approximate and, more importantly, at what cost and accuracy. One network architecture of practical use, especially for approximation tasks involving images, is convolutional (residual) networks. However, due to the locality of the linear operators involved in these networks, their analysis is more complicated than for generic fully connected neural networks. This paper focuses on sequence approximation tasks, where a matrix or a higher-order tensor represents each observation. We show that when approximating sequences arising from space-time discretisations of PDEs we may use relatively small networks. We constructively derive these results by exploiting connections between discrete convolution and finite difference operators. Throughout, we design our network architecture to, while having guarantees, be similar to those typically adopted in practice for sequence approximation tasks. Our theoretical results are supported by numerical experiments which simulate linear advection, the heat equation, and the Fisher equation. The implementation used is available at the repository associated to the paper.
翻译:神经网络在高维空间中诸多逼近任务中达到最优性能,已有大量实验证据支持。然而,我们仍需对其能够逼近的对象,以及更重要的——逼近成本与精度——建立扎实的理论理解。一种在涉及图像的逼近任务中具有实用价值的网络架构是卷积(残差)网络。但由于此类网络中线性算子的局部性特征,其分析比通用全连接神经网络更为复杂。本文聚焦于序列逼近任务,其中每个观测值由矩阵或高阶张量表示。我们证明,在逼近由偏微分方程时空离散化产生的序列时,可采用相对较小的网络。通过揭示离散卷积与有限差分算子之间的关联,我们以构造性方式推导出这些结论。在整个研究过程中,我们设计的网络架构在具备理论保证的同时,与序列逼近任务实际采用的典型架构保持相似性。理论结果得到了模拟线性平流方程、热传导方程及费舍尔方程的数值实验支持。论文相关存储库提供了所用实现代码。