Pre-training a diverse set of neural network controllers in simulation has enabled robots to adapt online to damage in robot locomotion tasks. However, finding diverse, high-performing controllers requires expensive network training and extensive tuning of a large number of hyperparameters. On the other hand, Covariance Matrix Adaptation MAP-Annealing (CMA-MAE), an evolution strategies (ES)-based quality diversity algorithm, does not have these limitations and has achieved state-of-the-art performance on standard QD benchmarks. However, CMA-MAE cannot scale to modern neural network controllers due to its quadratic complexity. We leverage efficient approximation methods in ES to propose three new CMA-MAE variants that scale to high dimensions. Our experiments show that the variants outperform ES-based baselines in benchmark robotic locomotion tasks, while being comparable with or exceeding state-of-the-art deep reinforcement learning-based quality diversity algorithms.
翻译:在仿真环境中预训练多样化的神经网络控制器,使机器人能够在行走任务中在线适应损伤。然而,寻找多样化且高性能的控制器需要昂贵的网络训练以及对大量超参数的广泛调优。另一方面,协方差矩阵自适应MAP退火算法(CMA-MAE)作为一种基于进化策略(ES)的质量多样性算法,不存在这些限制,并在标准质量多样性基准测试中达到了最先进的性能。然而,CMA-MAE因其二次复杂度无法扩展到现代神经网络控制器。我们利用ES中的高效近似方法,提出了三种可扩展至高维的新CMA-MAE变体。实验表明,这些变体在基准机器人行走任务中优于基于ES的基线方法,同时与基于深度强化学习的最先进质量多样性算法相当或更优。