Controlling high-dimensional stochastic systems, critical in robotics, autonomous vehicles, and hyperchaotic systems, faces the curse of dimensionality, lacks temporal abstraction, and often fails to ensure stochastic stability. To overcome these limitations, this study introduces the Multi-Timescale Lyapunov-Constrained Hierarchical Reinforcement Learning (MTLHRL) framework. MTLHRL integrates a hierarchical policy within a semi-Markov Decision Process (SMDP), featuring a high-level policy for strategic planning and a low-level policy for reactive control, which effectively manages complex, multi-timescale decision-making and reduces dimensionality overhead. Stability is rigorously enforced using a neural Lyapunov function optimized via Lagrangian relaxation and multi-timescale actor-critic updates, ensuring mean-square boundedness or asymptotic stability in the face of stochastic dynamics. The framework promotes efficient and reliable learning through trust-region constraints and decoupled optimization. Extensive simulations on an 8D hyperchaotic system and a 5-DOF robotic manipulator demonstrate MTLHRL's empirical superiority. It significantly outperforms baseline methods in both stability and performance, recording the lowest error indices (e.g., Integral Absolute Error (IAE): 3.912 in hyperchaotic control and IAE: 1.623 in robotics), achieving faster convergence, and exhibiting superior disturbance rejection. MTLHRL offers a theoretically grounded and practically viable solution for robust control of complex stochastic systems.
翻译:控制高维随机系统(在机器人学、自动驾驶车辆和超混沌系统中至关重要)面临着维度灾难、缺乏时间抽象且通常无法确保随机稳定性的挑战。为克服这些限制,本研究提出了多时间尺度李雅普诺夫约束分层强化学习(MTLHRL)框架。MTLHRL在半马尔可夫决策过程(SMDP)中集成了一个分层策略,其特点是一个用于战略规划的高层策略和一个用于反应式控制的低层策略,从而有效管理复杂的多时间尺度决策并降低维度开销。稳定性通过使用经拉格朗日松弛和多时间尺度行动者-评论家更新优化的神经李雅普诺夫函数严格保证,确保了在随机动态下的均方有界性或渐近稳定性。该框架通过信赖域约束和解耦优化促进了高效可靠的学习。在8维超混沌系统和5自由度机器人操纵器上的大量仿真实验证明了MTLHRL的经验优越性。它在稳定性和性能方面均显著优于基线方法,记录了最低的误差指标(例如,超混沌控制中的积分绝对误差(IAE):3.912,机器人学中的IAE:1.623),实现了更快的收敛速度,并展现出优异的抗干扰能力。MTLHRL为复杂随机系统的鲁棒控制提供了一个理论基础坚实且实际可行的解决方案。