Autonomous robots deployed in the real world will need control policies that rapidly adapt to environmental changes. To this end, we propose AutoRobotics-Zero (ARZ), a method based on AutoML-Zero that discovers zero-shot adaptable policies from scratch. In contrast to neural network adaption policies, where only model parameters are optimized, ARZ can build control algorithms with the full expressive power of a linear register machine. We evolve modular policies that tune their model parameters and alter their inference algorithm on-the-fly to adapt to sudden environmental changes. We demonstrate our method on a realistic simulated quadruped robot, for which we evolve safe control policies that avoid falling when individual limbs suddenly break. This is a challenging task in which two popular neural network baselines fail. Finally, we conduct a detailed analysis of our method on a novel and challenging non-stationary control task dubbed Cataclysmic Cartpole. Results confirm our findings that ARZ is significantly more robust to sudden environmental changes and can build simple, interpretable control policies.
翻译:在现实世界中部署的自主机器人需要能够快速适应环境变化的控制策略。为此,我们提出AutoRobotics-Zero(ARZ)方法——基于AutoML-Zero,从零发现零样本自适应策略。与仅优化模型参数的神经网络自适应策略不同,ARZ能够构建具有线性寄存器机器全部表达能力的控制算法。我们进化出模块化策略,这些策略可实时调整自身模型参数并改变推理算法,以应对突发环境变化。我们在一个逼真的仿真四足机器人上验证了该方法,为其进化出安全的控制策略,使其在单个肢体突然断裂时避免摔倒。这是一个具有挑战性的任务,两种主流神经网络基线均无法完成。最后,我们在一个名为"灾变倒立摆"的新型非平稳控制任务上对方法进行了详细分析。结果证实了我们的发现:ARZ对突发环境变化的鲁棒性显著更强,且能构建简单、可解释的控制策略。