Evolution and learning in differentiable robots

The automatic design of robots has existed for 30 years but has been constricted by serial non-differentiable design evaluations, premature convergence to simple bodies or clumsy behaviors, and a lack of sim2real transfer to physical machines. Thus, here we employ massively-parallel differentiable simulations to rapidly and simultaneously optimize individual neural control of behavior across a large population of candidate body plans and return a fitness score for each design based on the performance of its fully optimized behavior. Non-differentiable changes to the mechanical structure of each robot in the population -- mutations that rearrange, combine, add, or remove body parts -- were applied by a genetic algorithm in an outer loop of search, generating a continuous flow of novel morphologies with highly-coordinated and graceful behaviors honed by gradient descent. This enabled the exploration of several orders-of-magnitude more designs than all previous methods, despite the fact that robots here have the potential to be much more complex, in terms of number of independent motors, than those in prior studies. We found that evolution reliably produces ``increasingly differentiable'' robots: body plans that smooth the loss landscape in which learning operates and thereby provide better training paths toward performant behaviors. Finally, one of the highly differentiable morphologies discovered in simulation was realized as a physical robot and shown to retain its optimized behavior. This provides a cyberphysical platform to investigate the relationship between evolution and learning in biological systems and broadens our understanding of how a robot's physical structure can influence the ability to train policies for it. Videos and code at https://sites.google.com/view/eldir.

翻译：机器人自动设计已存在三十年，但一直受限于串行不可微分设计评估、过早收敛于简单躯体或笨拙行为，以及缺乏向物理机器的仿真到现实迁移能力。因此，本文采用大规模并行可微分仿真技术，快速同步优化大量候选躯体方案中个体行为的神经控制，并根据其完全优化行为的性能为每个设计返回适应度评分。群体中每个机器人机械结构的不可微分变更——即通过重组、合并、添加或移除身体部件实现的突变——由遗传算法在外层搜索循环中实施，持续生成经梯度下降精炼的、具有高度协调性与优雅行为的新型形态结构。尽管本研究中机器人在独立电机数量方面可能比先前研究中的机器人复杂得多，但该方法实现了对超过以往所有方法数个数量级的设计空间的探索。我们发现进化过程能可靠地产生"日益可微分"的机器人：这些躯体方案能够平滑学习运作的损失景观，从而为获得高性能行为提供更优的训练路径。最后，将仿真中发现的一种高度可微分形态制作为物理机器人，并验证其保留了优化后的行为特性。这为研究生物系统中进化与学习的关系提供了信息物理平台，并拓宽了我们对机器人物理结构如何影响策略训练能力的认知。视频与代码详见 https://sites.google.com/view/eldir。