Generating physical movement behaviours from their symbolic description is a long-standing challenge in artificial intelligence (AI) and robotics, requiring insights into numerical optimization methods as well as into formalizations from symbolic AI and reasoning. In this paper, a novel approach to finding a reward function from a symbolic description is proposed. The intended system behaviour is modelled as a hybrid automaton, which reduces the system state space to allow more efficient reinforcement learning. The approach is applied to bipedal walking, by modelling the walking robot as a hybrid automaton over state space orthants, and used with the compass walker to derive a reward that incentivizes following the hybrid automaton cycle. As a result, training times of reinforcement learning controllers are reduced while final walking speed is increased. The approach can serve as a blueprint how to generate reward functions from symbolic AI and reasoning.
翻译:从符号描述生成物理运动行为是人工智能与机器人学领域长期面临的挑战,需要综合数值优化方法以及符号人工智能与推理的形式化手段。本文提出一种从符号描述中寻找奖励函数的新方法。将预期系统行为建模为混合自动机,通过缩减系统状态空间实现更高效的强化学习。该方法应用于双足行走场景:将行走机器人建模为状态空间卦限上的混合自动机,结合指南针步行器推导出激励遵循混合自动机周期的奖励函数。实验结果表明,该方法在降低强化学习控制器训练时间的同时,提升了最终行走速度。该工作可作为从符号人工智能与推理中生成奖励函数的通用范式。