In search of the simplest baseline capable of competing with Deep Reinforcement Learning on locomotion tasks, we propose a biologically inspired model-free open-loop strategy. Drawing upon prior knowledge and harnessing the elegance of simple oscillators to generate periodic joint motions, it achieves respectable performance in five different locomotion environments, with a number of tunable parameters that is a tiny fraction of the thousands typically required by RL algorithms. Unlike RL methods, which are prone to performance degradation when exposed to sensor noise or failure, our open-loop oscillators exhibit remarkable robustness due to their lack of reliance on sensors. Furthermore, we showcase a successful transfer from simulation to reality using an elastic quadruped, all without the need for randomization or reward engineering. Overall, the proposed baseline and associated experiments highlight the existing limitations of DRL for robotic applications, provide insights on how to address them, and encourage reflection on the costs of complexity and generality.
翻译:在寻求能够与深度强化学习在运动任务中竞争的简单基线时,我们提出了一种受生物学启发、无模型的开环策略。该方法利用先验知识并借助简单振荡器生成周期性关节运动的优雅性,在五个不同的运动环境中取得了可观的性能,其可调参数数量仅为强化学习算法通常所需的数千个参数中的极小一部分。与强化学习方法在传感器噪声或故障下易出现性能退化不同,我们的开环振荡器因不依赖传感器而展现出卓越的鲁棒性。此外,我们通过一个弹性四足机器人成功实现了从仿真到现实的迁移,而无需随机化或奖励工程。总体而言,所提出的基线及相关实验揭示了深度强化学习在机器人应用中的现有局限性,提供了解决这些问题的见解,并鼓励反思复杂性与通用性的代价。