Evolution and learning in differentiable robots

The automatic design of robots has existed for 30 years but has been constricted by serial non-differentiable design evaluations, premature convergence to simple bodies or clumsy behaviors, and a lack of sim2real transfer to physical machines. Thus, here we employ massively-parallel differentiable simulations to rapidly and simultaneously optimize individual neural control of behavior across a large population of candidate body plans and return a fitness score for each design based on the performance of its fully optimized behavior. Non-differentiable changes to the mechanical structure of each robot in the population -- mutations that rearrange, combine, add, or remove body parts -- were applied by a genetic algorithm in an outer loop of search, generating a continuous flow of novel morphologies with highly-coordinated and graceful behaviors honed by gradient descent. This enabled the exploration of several orders-of-magnitude more designs than all previous methods, despite the fact that robots here have the potential to be much more complex, in terms of number of independent motors, than those in prior studies. We found that evolution reliably produces ``increasingly differentiable'' robots: body plans that smooth the loss landscape in which learning operates and thereby provide better training paths toward performant behaviors. Finally, one of the highly differentiable morphologies discovered in simulation was realized as a physical robot and shown to retain its optimized behavior. This provides a cyberphysical platform to investigate the relationship between evolution and learning in biological systems and broadens our understanding of how a robot's physical structure can influence the ability to train policies for it. Videos and code at https://sites.google.com/view/eldir.

翻译：机器人自动设计已存在30年，但一直受限于串行不可微的设计评估、过早收敛于简单躯体或笨拙行为，以及缺乏向物理机器的仿真到现实迁移能力。为此，我们采用大规模并行可微分仿真技术，快速同步优化大量候选躯体方案中个体的神经行为控制，并根据其完全优化行为的性能为每个设计返回适应度评分。种群中每个机器人机械结构的不可微变更——即通过重组、合并、添加或移除身体部件实现的突变——由遗传算法在外层搜索循环中实施，持续生成经梯度下降优化的、具有高度协调性与优雅行为的新型形态结构。尽管本研究中机器人在独立电机数量方面可能比以往研究中的机器人复杂得多，但该方法实现了比所有先前方法多出数个数量级的设计探索。我们发现，进化过程能可靠地产生"日益可微分"的机器人：这些躯体方案能够平滑学习运作的损失景观，从而为获得高性能行为提供更优的训练路径。最后，我们在仿真中发现的一种高度可微分形态被制作为物理机器人，并验证其保留了优化后的行为特性。这为研究生物系统中进化与学习的关系提供了信息物理平台，并拓展了我们对机器人物理结构如何影响策略训练能力的理解。视频与代码详见 https://sites.google.com/view/eldir。