Optimizing functions without access to gradients is the remit of black-box methods such as evolution strategies. While highly general, their learning dynamics are often times heuristic and inflexible - exactly the limitations that meta-learning can address. Hence, we propose to discover effective update rules for evolution strategies via meta-learning. Concretely, our approach employs a search strategy parametrized by a self-attention-based architecture, which guarantees the update rule is invariant to the ordering of the candidate solutions. We show that meta-evolving this system on a small set of representative low-dimensional analytic optimization problems is sufficient to discover new evolution strategies capable of generalizing to unseen optimization problems, population sizes and optimization horizons. Furthermore, the same learned evolution strategy can outperform established neuroevolution baselines on supervised and continuous control tasks. As additional contributions, we ablate the individual neural network components of our method; reverse engineer the learned strategy into an explicit heuristic form, which remains highly competitive; and show that it is possible to self-referentially train an evolution strategy from scratch, with the learned update rule used to drive the outer meta-learning loop.
翻译:无需梯度即可优化函数是进化策略等黑箱方法的职责范围。虽然这些方法具有高度通用性,但其学习动态常常是启发式且缺乏灵活性的——这正是元学习可以解决的局限性。因此,我们提出通过元学习来发现有效的进化策略更新规则。具体而言,我们的方法采用基于自注意力机制的架构参数化搜索策略,该策略保证更新规则对候选解的排列顺序具有不变性。我们证明,在少量具有代表性的低维解析优化问题上对该系统进行元进化,足以发现能够泛化至未见优化问题、种群规模和优化时域的新进化策略。此外,相同的习得进化策略在监督式和连续控制任务上能够超越已有的神经进化基线方法。作为额外贡献,我们对方法中的各个神经网络组件进行了消融分析;将习得策略逆向工程为显式启发式形式,该形式仍保持高度竞争力;并证明完全可以从头开始通过自参照方式训练进化策略,使用习得的更新规则驱动外部元学习循环。