In this work, a novel and model-based artificial neural network (ANN) training method is developed supported by optimal control theory. The method augments training labels in order to robustly guarantee training loss convergence and improve training convergence rate. Dynamic label augmentation is proposed within the framework of gradient descent training where the convergence of training loss is controlled. First, we capture the training behavior with the help of empirical Neural Tangent Kernels (NTK) and borrow tools from systems and control theory to analyze both the local and global training dynamics (e.g. stability, reachability). Second, we propose to dynamically alter the gradient descent training mechanism via fictitious labels as control inputs and an optimal state feedback policy. In this way, we enforce locally $\mathcal{H}_2$ optimal and convergent training behavior. The novel algorithm, \textit{Controlled Descent Training} (CDT), guarantees local convergence. CDT unleashes new potentials in the analysis, interpretation, and design of ANN architectures. The applicability of the method is demonstrated on standard regression and classification problems.
翻译:本文提出了一种基于最优控制理论的新型模型驱动人工神经网络训练方法。该方法通过增强训练标签,稳健地保证训练损失收敛并提升训练收敛速度。在梯度下降训练框架内,我们提出了动态标签增强策略,以实现对训练损失收敛的受控。首先,利用经验神经正切核(NTK)捕捉训练行为,并借助系统与控制理论工具分析局部与全局训练动力学(如稳定性、可达性)。其次,我们提出通过虚构标签作为控制输入及最优状态反馈策略,动态调整梯度下降训练机制。由此,实现了局部$\mathcal{H}_2$最优且收敛的训练行为。新算法——受控下降训练(Controlled Descent Training, CDT)——保证了局部收敛性。CDT为人工神经网络架构的分析、解读与设计释放了新的潜力。该方法的适用性通过标准回归与分类问题进行了验证。