In search of a simple baseline for Deep Reinforcement Learning in locomotion tasks, we propose a model-free open-loop strategy. By leveraging prior knowledge and the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by DRL algorithms. We conduct two additional experiments using open-loop oscillators to identify current shortcomings of these algorithms. Our results show that, compared to the baseline, DRL is more prone to performance degradation when exposed to sensor noise or failure. Furthermore, we demonstrate a successful transfer from simulation to reality using an elastic quadruped, where RL fails without randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.
翻译:针对运动控制任务中的深度强化学习,我们提出一种无模型的开放式循环策略作为简单基线。通过利用先验知识与简单振荡器生成周期性关节运动的优雅特性,该方法在五个不同运动环境中取得了可观表现,其可调参数数量仅为深度强化学习算法通常所需数千个参数中的极小部分。我们进一步通过开放式循环振荡器开展两项实验,揭示了现有算法存在的缺陷。结果表明,与基线相比,深度强化学习在传感器噪声或故障干扰下更易出现性能退化。此外,我们使用弹性四足机器人成功实现了从仿真到现实的迁移,而在无随机化或奖励工程的情况下,强化学习无法完成该迁移。总体而言,所提出的基线及关联实验凸显了深度强化学习在机器人应用中的现有局限性,为应对这些挑战提供了见解,并促使学界反思复杂性与通用性背后的代价。