This paper introduces a novel operator, termed the Y operator, to elevate control performance in Actor-Critic(AC) based reinforcement learning for systems governed by stochastic differential equations(SDEs). The Y operator ingeniously integrates the stochasticity of a class of child-mother system into the Critic network's loss function, yielding substantial advancements in the control performance of RL algorithms.Additionally, the Y operator elegantly reformulates the challenge of solving partial differential equations for the state-value function into a parallel problem for the drift and diffusion functions within the system's SDEs.A rigorous mathematical proof confirms the operator's validity.This transformation enables the Y Operator-based Reinforcement Learning(YORL) framework to efficiently tackle optimal control problems in both model-based and data-driven systems.The superiority of YORL is demonstrated through linear and nonlinear numerical examples showing its enhanced performance over existing methods post convergence.
翻译:本文提出一种新颖的Y算子,用于提升由随机微分方程(SDEs)驱动的Actor-Critic(AC)强化学习系统的控制性能。该算子巧妙地将一类母子系统的随机性融入Critic网络的损失函数中,显著推动了强化学习算法控制性能的提升。此外,Y算子将状态值函数偏微分方程的求解问题优雅地转化为系统SDEs中漂移函数与扩散函数的并行求解问题。严格的数学证明验证了该算子的有效性。这一转化使得基于Y算子的强化学习(YORL)框架能够高效求解基于模型与数据驱动系统的最优控制问题。通过线性和非线性数值算例,验证了YORL在收敛后相较于现有方法具有更优越的性能。