Meta-learning of numerical algorithms for a given task consists of the data-driven identification and adaptation of an algorithmic structure and the associated hyperparameters. To limit the complexity of the meta-learning problem, neural architectures with a certain inductive bias towards favorable algorithmic structures can, and should, be used. We generalize our previously introduced Runge-Kutta neural network to a recursively recurrent neural network (R2N2) superstructure for the design of customized iterative algorithms. In contrast to off-the-shelf deep learning approaches, it features a distinct division into modules for generation of information and for the subsequent assembly of this information towards a solution. Local information in the form of a subspace is generated by subordinate, inner, iterations of recurrent function evaluations starting at the current outer iterate. The update to the next outer iterate is computed as a linear combination of these evaluations, reducing the residual in this space, and constitutes the output of the network. We demonstrate that regular training of the weight parameters inside the proposed superstructure on input/output data of various computational problem classes yields iterations similar to Krylov solvers for linear equation systems, Newton-Krylov solvers for nonlinear equation systems, and Runge-Kutta integrators for ordinary differential equations. Due to its modularity, the superstructure can be readily extended with functionalities needed to represent more general classes of iterative algorithms traditionally based on Taylor series expansions.
翻译:针对特定任务的数值算法元学习包括数据驱动的算法结构识别与自适应,以及相关超参数的调整。为降低元学习问题的复杂度,应采用具有对有利算法结构特定归纳偏好的神经架构。我们将先前提出的龙格-库塔神经网络泛化为一种递归循环神经网络(R2N2)上层结构,用于设计定制化迭代算法。与现成的深度学习方法不同,该架构明确划分为信息生成模块和将信息整合为解决方案的后续组装模块。子空间形式的局部信息由从当前外迭代开始的递归函数求值的内层子迭代生成。下一外迭代的更新量被计算为这些求值的线性组合,从而降低该空间内的残差,并构成网络的输出。我们证明,在不同计算问题类别的输入/输出数据上对该上层结构内部的权重参数进行常规训练,可得到类似于线性方程组的Krylov求解器、非线性方程组的Newton-Krylov求解器以及常微分方程的Runge-Kutta积分器的迭代过程。由于其模块化特性,该上层结构可便捷地扩展所需功能,用于表示传统上基于泰勒级数展开的更一般类别的迭代算法。