专家混合缓解算子学习中的维度诅咒 (Mixture of Experts Softens the Curse of Dimensionality in Operator Learning)

We study the approximation-theoretic implications of mixture-of-experts architectures for operator learning, where the complexity of a single large neural operator is distributed across many small neural operators (NOs), and each input is routed to exactly one NO via a decision tree. We analyze how this tree-based routing and expert decomposition affect approximation power, sample complexity, and stability. Our main result is a distributed universal approximation theorem for mixture of neural operators (MoNOs): any Lipschitz nonlinear operator between $L^2([0,1]^d)$ spaces can be uniformly approximated over the Sobolev unit ball to arbitrary accuracy $\varepsilon>0$ by an MoNO, where each expert NO has a depth, width, and rank scaling as $\mathcal{O}(\varepsilon^{-1})$. Although the number of experts may grow with accuracy, each NO remains small, enough to fit within active memory of standard hardware for reasonable accuracy levels. Our analysis also yields new quantitative approximation rates for classical NOs approximating uniformly continuous nonlinear operators uniformly on compact subsets of $L^2([0,1]^d)$.

翻译：我们研究了专家混合架构在算子学习中的近似理论意义，其中单个大型神经算子的复杂度被分布到许多小型神经算子（NOs）上，每个输入通过决策树被路由到恰好一个NO。我们分析了这种基于树的路由和专家分解如何影响近似能力、样本复杂性和稳定性。我们的主要结果是神经算子混合（MoNOs）的分布式通用近似定理：任何定义在$L^2([0,1]^d)$空间之间的Lipschitz非线性算子，都可以在Sobolev单位球上被MoNO以任意精度$\varepsilon>0$一致逼近，其中每个专家NO的深度、宽度和秩的尺度为$\mathcal{O}(\varepsilon^{-1})$。尽管专家数量可能随精度增长，但每个NO保持足够小，足以在合理精度水平下适配标准硬件的活动内存。我们的分析还为经典NOs在$L^2([0,1]^d)$的紧子集上一致逼近一致连续非线性算子提供了新的定量近似速率。