Complex mechanical systems such as vehicle powertrains are inherently subject to multiple nonlinearities and uncertainties arising from parametric variations. Modeling errors are therefore unavoidable, making the transfer of control systems from simulation to real-world systems a critical challenge. Traditional robust controls have limitations in handling certain types of nonlinearities and uncertainties, requiring a more practical approach capable of comprehensively compensating for these various constraints. This study proposes a new robust control approach using the framework of deep reinforcement learning (DRL). The key strategy lies in the synergy among domain randomization-based DRL, long short-term memory (LSTM)-based actor and critic networks, and model-based control (MBC). The problem setup is modeled via the latent Markov decision process (LMDP), a set of vanilla MDPs, for a controlled system subject to uncertainties and nonlinearities. In LMDP, the dynamics of an environment simulator is randomized during training to improve the robustness of the control system to real testing environments. The randomization increases training difficulties as well as conservativeness of the resultant control system; therefore, progress is assisted by concurrent use of a model-based controller based on a physics-based system model. Compared to traditional DRL-based controls, the proposed approach is smarter in that we can achieve a high level of generalization ability with a more compact neural network architecture and a smaller amount of training data. The controller is verified via practical application to active damping for a complex powertrain system with nonlinearities and parametric variations. Comparative tests demonstrate the high robustness of the proposed approach.
翻译:车辆动力传动系统等复杂机械系统本质上受到多种非线性特性及参数变化所引发不确定性的影响。因此建模误差不可避免,使得控制系统从仿真环境迁移至实际系统成为关键挑战。传统鲁棒控制在处理特定类型的非线性和不确定性方面存在局限,需要一种能够全面补偿这些约束的更具实用性的方法。本研究提出一种基于深度强化学习框架的新型鲁棒控制方法。其核心策略在于融合基于领域随机化的深度强化学习、基于长短期记忆网络的执行器与评判器网络以及基于模型的控制方法。通过潜在马尔可夫决策过程(一组基础MDP)对受不确定性和非线性影响的受控系统进行问题建模。在LMDP框架中,环境模拟器的动力学参数在训练过程中进行随机化处理,以提升控制系统对实际测试环境的鲁棒性。随机化过程在增加训练难度的同时也会导致最终控制系统的保守性增强;因此,我们通过同步使用基于物理系统模型的模型控制器来辅助训练进程。与传统基于深度强化学习的控制方法相比,本研究所提方法具有更高智能性,能够在更紧凑的神经网络架构和更少训练数据条件下实现高水平的泛化能力。通过将控制器实际应用于具有非线性和参数变化的复杂动力传动系统主动阻尼控制,验证了其有效性。对比实验表明所提方法具有卓越的鲁棒性。