Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement learning. However, these outstanding controllers with natural motion style and high task performance are developed through extensive reward engineering, which is a highly laborious and time-consuming process of designing numerous reward terms and determining suitable reward coefficients. In this work, we propose a novel reinforcement learning framework for training neural network controllers for complex robotic systems consisting of both rewards and constraints. To let the engineers appropriately reflect their intent to constraints and handle them with minimal computation overhead, two constraint types and an efficient policy optimization algorithm are suggested. The learning framework is applied to train locomotion controllers for several legged robots with different morphology and physical attributes to traverse challenging terrains. Extensive simulation and real-world experiments demonstrate that performant controllers can be trained with significantly less reward engineering, by tuning only a single reward coefficient. Furthermore, a more straightforward and intuitive engineering process can be utilized, thanks to the interpretability and generalizability of constraints. The summary video is available at https://youtu.be/KAlm3yskhvM.
翻译:多项早期研究表明,通过使用神经网络设计控制器并采用无模型强化学习进行训练,可在复杂机器人系统中实现出色的控制性能。然而,这些具有自然运动风格和高任务性能的卓越控制器是通过广泛的奖励工程开发的,这涉及设计大量奖励项并确定合适的奖励系数,是一项高度费力且耗时的过程。在本工作中,我们提出了一种新颖的强化学习框架,用于训练复杂机器人系统的神经网络控制器,该框架同时包含奖励和约束。为了让工程师能够适当地将意图反映到约束中,并以最小的计算开销处理它们,我们提出了两种约束类型和一种高效的策略优化算法。该学习框架被应用于为具有不同形态和物理属性的多款腿式机器人训练运动控制器,以穿越具有挑战性的地形。大量的仿真和实际实验表明,通过仅调整一个奖励系数,可以显著减少奖励工程来训练出高性能的控制器。此外,由于约束的可解释性和泛化性,可以使用更直观和简单的工程过程。总结视频见https://youtu.be/KAlm3yskhvM。