In addressing control problems such as regulation and tracking through reinforcement learning, it is often required to guarantee that the acquired policy meets essential performance and stability criteria such as a desired settling time and steady-state error prior to deployment. Motivated by this necessity, we present a set of results and a systematic reward shaping procedure that (i) ensures the optimal policy generates trajectories that align with specified control requirements and (ii) allows to assess whether any given policy satisfies them. We validate our approach through comprehensive numerical experiments conducted in two representative environments from OpenAI Gym: the Inverted Pendulum swing-up problem and the Lunar Lander. Utilizing both tabular and deep reinforcement learning methods, our experiments consistently affirm the efficacy of our proposed framework, highlighting its effectiveness in ensuring policy adherence to the prescribed control requirements.
翻译:在通过强化学习解决调节和跟踪等控制问题时,通常需要在部署前确保所获得的策略满足关键的性能和稳定性标准,例如期望的调节时间和稳态误差。受此需求驱动,我们提出了一系列结论和系统性的奖励塑形方法,该方法:(i)确保最优策略生成的轨迹符合指定的控制需求;(ii)允许评估任何给定策略是否满足这些需求。我们通过在OpenAI Gym中两个代表性环境——倒立摆摆动控制问题和月球着陆器——上进行的全面数值实验验证了所提方法。利用表格型和深度强化学习方法,我们的实验始终证实了所提框架的有效性,突出了其在确保策略遵循预设控制需求方面的效能。