Neural scaling laws play a pivotal role in the performance of deep neural networks and have been observed in a wide range of tasks. However, a complete theoretical framework for understanding these scaling laws remains underdeveloped. In this paper, we explore the neural scaling laws for deep operator networks, which involve learning mappings between function spaces, with a focus on the Chen and Chen style architecture. These approaches, which include the popular Deep Operator Network (DeepONet), approximate the output functions using a linear combination of learnable basis functions and coefficients that depend on the input functions. We establish a theoretical framework to quantify the neural scaling laws by analyzing its approximation and generalization errors. We articulate the relationship between the approximation and generalization errors of deep operator networks and key factors such as network model size and training data size. Moreover, we address cases where input functions exhibit low-dimensional structures, allowing us to derive tighter error bounds. These results also hold for deep ReLU networks and other similar structures. Our results offer a partial explanation of the neural scaling laws in operator learning and provide a theoretical foundation for their applications.
翻译:神经缩放定律在深度神经网络性能中起着关键作用,并已在广泛任务中被观测到。然而,理解这些缩放定律的完整理论框架仍未充分发展。本文研究了深度算子网络的神经缩放定律,该类网络涉及函数空间之间映射的学习,重点关注Chen和Chen风格的架构。这些方法(包括流行的深度算子网络DeepONet)通过可学习基函数与依赖于输入函数的系数的线性组合来逼近输出函数。我们建立了一个理论框架,通过分析其逼近误差与泛化误差来量化神经缩放定律。我们阐明了深度算子网络的逼近误差、泛化误差与关键因素(如网络模型规模和训练数据规模)之间的关系。此外,我们还处理了输入函数呈现低维结构的情形,从而推导出更紧致的误差界。这些结果同样适用于深度ReLU网络及其他类似结构。我们的研究结果为算子学习中的神经缩放定律提供了部分解释,并为其应用奠定了理论基础。