Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaptation policies, where only model parameters are optimized, ARZ can build control algorithms with the full expressive power of a linear register machine. We evolve modular policies that tune their model parameters and alter their inference algorithm on-the-fly to adapt to sudden environmental changes. We demonstrate our method on a realistic simulated quadruped robot, for which we evolve safe control policies that avoid falling when individual limbs suddenly break. This is a challenging task in which two popular neural network baselines fail. Finally, we conduct a detailed analysis of our method on a novel and challenging non-stationary control task dubbed Cataclysmic Cartpole. Results confirm our findings that ARZ is significantly more robust to sudden environmental changes and can build simple, interpretable control policies.
翻译:在现实世界中部署的自主机器人需要能够快速适应环境变化的控制策略。为此,我们提出AutoRobotics-Zero (ARZ)方法,它基于AutoML-Zero从零发现零样本适应策略。与仅优化模型参数的神经网络适应策略不同,ARZ能够构建拥有完整线性寄存器机器表达能力的控制算法。我们演化出模块化策略,这些策略能动态调整自身模型参数并实时改变推理算法以适应突发环境变化。我们在一个逼真的模拟四足机器人上演示了该方法,为其进化出在单个肢体突然断裂时避免跌倒的安全控制策略。这是一项具有挑战性的任务,两个流行的神经网络基线均以失败告终。最后,我们在一项名为"灾难性倒立摆"的新型非平稳控制任务上对方法进行了详细分析。结果证实了我们的发现:ARZ对突发环境变化具有显著更强的鲁棒性,并能构建出简单、可解释的控制策略。