Sparse Mutation Decompositions: Fine Tuning Deep Neural Networks with Subspace Evolution

Neuroevolution is a promising area of research that combines evolutionary algorithms with neural networks. A popular subclass of neuroevolutionary methods, called evolution strategies, relies on dense noise perturbations to mutate networks, which can be sample inefficient and challenging for large models with millions of parameters. We introduce an approach to alleviating this problem by decomposing dense mutations into low-dimensional subspaces. Restricting mutations in this way can significantly reduce variance as networks can handle stronger perturbations while maintaining performance, which enables a more controlled and targeted evolution of deep networks. This approach is uniquely effective for the task of fine tuning pre-trained models, which is an increasingly valuable area of research as networks continue to scale in size and open source models become more widely available. Furthermore, we show how this work naturally connects to ensemble learning where sparse mutations encourage diversity among children such that their combined predictions can reliably improve performance. We conduct the first large scale exploration of neuroevolutionary fine tuning and ensembling on the notoriously difficult ImageNet dataset, where we see small generalization improvements with only a single evolutionary generation using nearly a dozen different deep neural network architectures.

翻译：神经进化是一个结合进化算法与神经网络的具有前景的研究领域。一种流行的神经进化方法子类——进化策略——依赖于密集噪声扰动对网络进行突变，这对于拥有数百万参数的大型模型而言可能样本效率低下且挑战性极大。我们提出了一种缓解该问题的方法，即通过将密集突变分解至低维子空间。以这种方式限制突变可显著降低方差，因为网络能在保持性能的同时承受更强的扰动，从而实现对深度网络更可控且更具针对性的进化。该方法在微调预训练模型这一任务上尤为有效——随着网络规模持续扩大及开源模型更广泛普及，该研究领域正变得日益重要。此外，我们展示了这项工作如何自然地与集成学习相关联：稀疏突变能够促进子代间的多样性，使其组合预测可靠地提升性能。我们在公认极具挑战性的ImageNet数据集上开展了首次大规模神经进化微调与集成探索，仅通过单一进化世代便在使用近十种不同深度神经网络架构时观察到了小幅的泛化性能提升。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

Meta最新WWW2022《联邦计算导论》教程，附77页ppt

专知会员服务

60+阅读 · 2022年5月5日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日