Stitching for Neuroevolution: Recombining Deep Neural Networks without Breaking Them

Traditional approaches to neuroevolution often start from scratch. This becomes prohibitively expensive in terms of computational and data requirements when targeting modern, deep neural networks. Using a warm start could be highly advantageous, e.g., using previously trained networks, potentially from different sources. This moreover enables leveraging the benefits of transfer learning (in particular vastly reduced training effort). However, recombining trained networks is non-trivial because architectures and feature representations typically differ. Consequently, a straightforward exchange of layers tends to lead to a performance breakdown. We overcome this by matching the layers of parent networks based on their connectivity, identifying potential crossover points. To correct for differing feature representations between these layers we employ stitching, which merges the networks by introducing new layers at crossover points. To train the merged network, only stitching layers need to be considered. New networks can then be created by selecting a subnetwork by choosing which stitching layers to (not) use. Assessing their performance is efficient as only their evaluation on data is required. We experimentally show that our approach enables finding networks that represent novel trade-offs between performance and computational cost, with some even dominating the original networks.

翻译：传统的神经进化方法通常从零开始。当针对现代深度神经网络时，这种方法在计算和数据需求方面变得极其昂贵。采用热启动（即利用先前训练好的网络，这些网络可能来自不同来源）将极具优势。这还便于利用迁移学习的好处（特别是大幅减少训练工作量）。然而，重组已训练的网络并非易事，因为其架构和特征表示通常存在差异。因此，直接交换层往往会导致性能崩溃。我们通过基于连接性匹配父网络的层来克服这一问题，从而识别潜在的交叉点。为了纠正这些层之间特征表示的差异，我们采用了拼接技术——通过在交叉点引入新层来合并网络。训练合并后的网络时，仅需考虑拼接层。随后，通过选择是否（不）使用某些拼接层来选取子网络，即可创建新的网络。评估这些网络性能的效率很高，因为只需对其数据进行评估即可。我们的实验表明，这种方法能够找到在性能与计算成本之间实现新颖权衡的网络，其中一些甚至超越了原始网络。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日