Tasks for autonomous robotic systems commonly require stabilization to a desired region while maintaining safety specifications. However, solving this multi-objective problem is challenging when the dynamics are nonlinear and high-dimensional, as traditional methods do not scale well and are often limited to specific problem structures. To address this issue, we propose a novel approach to solve the stabilize-avoid problem via the solution of an infinite-horizon constrained optimal control problem (OCP). We transform the constrained OCP into epigraph form and obtain a two-stage optimization problem that optimizes over the policy in the inner problem and over an auxiliary variable in the outer problem. We then propose a new method for this formulation that combines an on-policy deep reinforcement learning algorithm with neural network regression. Our method yields better stability during training, avoids instabilities caused by saddle-point finding, and is not restricted to specific requirements on the problem structure compared to more traditional methods. We validate our approach on different benchmark tasks, ranging from low-dimensional toy examples to an F16 fighter jet with a 17-dimensional state space. Simulation results show that our approach consistently yields controllers that match or exceed the safety of existing methods while providing ten-fold increases in stability performance from larger regions of attraction.
翻译:自主机器人系统的任务通常要求在满足安全规范的同时,稳定至期望区域。然而,当系统具有非线性高维动力学特性时,这一多目标问题的求解极具挑战性——传统方法不仅扩展性差,且通常局限于特定问题结构。针对该问题,我们提出了一种通过求解无限时域约束最优控制问题来实现稳定-避障目标的新方法。将约束最优控制问题转化为图形式,进而获得一个双层优化问题:内层问题优化策略,外层问题优化辅助变量。我们进一步针对该形式提出新方法,将策略内深度强化学习算法与神经网络回归相结合。相较于传统方法,本方法在训练过程中展现出更优的稳定性,避免了鞍点求取导致的不稳定现象,且不受问题结构特定要求的限制。我们在从低维示例到17维状态空间的F16战斗机等多种基准任务上验证了该方法。仿真结果表明,本方法得到的控制器在安全性方面始终达到或超越现有方法水平,且通过扩大吸引域将稳定性能提升十倍。