Quality-Diversity (QD) algorithms have emerged as a powerful optimization paradigm with the aim of generating a set of high-quality and diverse solutions. To achieve such a challenging goal, QD algorithms require maintaining a large archive and a large population in each iteration, which brings two main issues, sample and resource efficiency. Most advanced QD algorithms focus on improving the sample efficiency, while the resource efficiency is overlooked to some extent. Particularly, the resource overhead during the training process has not been touched yet, hindering the wider application of QD algorithms. In this paper, we highlight this important research question, i.e., how to efficiently train QD algorithms with limited resources, and propose a novel and effective method called RefQD to address it. RefQD decomposes a neural network into representation and decision parts, and shares the representation part with all decision parts in the archive to reduce the resource overhead. It also employs a series of strategies to address the mismatch issue between the old decision parts and the newly updated representation part. Experiments on different types of tasks from small to large resource consumption demonstrate the excellent performance of RefQD: it not only uses significantly fewer resources (e.g., 16\% GPU memories on QDax and 3.7\% on Atari) but also achieves comparable or better performance compared to sample-efficient QD algorithms. Our code is available at \url{https://github.com/lamda-bbo/RefQD}.
翻译:质量多样性(Quality-Diversity, QD)算法已成为一种强大的优化范式,旨在生成一组高质量且多样化的解决方案。为实现这一具有挑战性的目标,QD算法需要在每次迭代中维护一个庞大的存档和种群,这带来了两个主要问题:样本效率和资源效率。现有最先进的QD算法主要致力于提升样本效率,而资源效率在一定程度上被忽视。尤其是训练过程中的资源开销尚未得到充分研究,这阻碍了QD算法的广泛应用。本文聚焦这一重要研究问题,即如何在资源受限条件下高效训练QD算法,并提出一种新颖有效的方法RefQD。RefQD将神经网络分解为表示部分和决策部分,并将表示部分与存档中所有决策部分共享,以降低资源开销。同时,它采用一系列策略解决旧决策部分与新更新表示部分之间的不匹配问题。在从低资源消耗到高资源消耗的不同类型任务上的实验表明,RefQD表现卓越:不仅显著减少资源使用(例如,在QDax上仅需16%的GPU内存,在Atari上仅需3.7%),而且相比样本效率高的QD算法,性能相当甚至更优。我们的代码已开源至\url{https://github.com/lamda-bbo/RefQD}。