In dynamic motion generation tasks, including contact and collisions, small changes in policy parameters can lead to extremely different returns. For example, in soccer, the ball can fly in completely different directions with a similar heading motion by slightly changing the hitting position or the force applied to the ball or when the friction of the ball varies. However, it is difficult to imagine that completely different skills are needed for heading a ball in different directions. In this study, we proposed a multitask reinforcement learning algorithm for adapting a policy to implicit changes in goals or environments in a single motion category with different reward functions or physical parameters of the environment. We evaluated the proposed method on the ball heading task using a monopod robot model. The results showed that the proposed method can adapt to implicit changes in the goal positions or the coefficients of restitution of the ball, whereas the standard domain randomization approach cannot cope with different task settings.
翻译:在包含接触与碰撞的动态运动生成任务中,策略参数的微小变化可能导致截然不同的回报。例如,在足球运动中,通过轻微调整击球位置、施加于球的力度或改变球的摩擦系数,相似的头部击球动作可使球朝完全不同的方向飞行。然而,我们很难想象需要对不同方向的头部击球采用截然不同的技能。本研究提出了一种多任务强化学习算法,旨在针对单一运动类别中具有不同奖励函数或环境物理参数的任务,实现策略对隐式目标或环境变化的自适应。我们采用单足机器人模型在头部击球任务上对所述方法进行了评估。结果表明,该方法能够适应目标位置或球恢复系数的隐式变化,而标准领域随机化方法无法应对不同的任务设定。