Deep neural networks (DNNs), trained with gradient-based optimization and backpropagation, are currently the primary tool in modern artificial intelligence, machine learning, and data science. In many applications, DNNs are trained offline, through supervised learning or reinforcement learning, and deployed online for inference. However, training DNNs with standard backpropagation and gradient-based optimization gives no intrinsic performance guarantees or bounds on the DNN, which is essential for applications such as controls. Additionally, many offline-training and online-inference problems, such as sim2real transfer of reinforcement learning policies, experience domain shift from the training distribution to the real-world distribution. To address these stability and transfer learning issues, we propose using techniques from control theory to update DNN parameters online. We formulate the fully-connected feedforward DNN as a continuous-time dynamical system, and we propose novel last-layer update laws that guarantee desirable error convergence under various conditions on the time derivative of the DNN input vector. We further show that training the DNN under spectral normalization controls the upper bound of the error trajectories of the online DNN predictions, which is desirable when numerically differentiated quantities or noisy state measurements are input to the DNN. The proposed online DNN adaptation laws are validated in simulation to learn the dynamics of the Van der Pol system under domain shift, where parameters are varied in inference from the training dataset. The simulations demonstrate the effectiveness of using control-theoretic techniques to derive performance improvements and guarantees in DNN-based learning systems.
翻译:深度神经网络(DNN)凭借基于梯度的优化与反向传播训练方法,已成为现代人工智能、机器学习与数据科学领域的主流工具。在诸多应用中,DNN通过监督学习或强化学习进行离线训练后,部署至在线推理场景。然而,采用标准反向传播与梯度优化训练的DNN无法提供内在性能保证或网络边界约束,这对控制等应用领域至关重要。此外,诸如强化学习策略的仿真-现实迁移等离线训练与在线推理问题,常面临训练分布与真实分布间的域偏移现象。为解决上述稳定性与迁移学习问题,我们提出采用控制理论技术实现DNN参数的在线更新。本文将全连接前馈DNN建模为连续时间动态系统,并提出新颖的末层更新律,该更新律能在DNN输入向量时间导数满足不同条件时保证理想误差收敛性。进一步研究表明,在谱归一化条件下训练DNN可约束在线预测误差轨迹的上界,这对DNN输入包含数值微分量或含噪状态测量值的情形尤为关键。所提出的在线DNN自适应律在域偏移场景下(推理阶段参数与训练数据集存在差异)的Van der Pol系统动力学学习仿真中得到了验证。仿真结果表明,采用控制理论技术可在基于DNN的学习系统中实现性能提升与性能保证。