Reusing Convolutional Neural Network Models through Modularization and Composition

from arxiv, Accepted by ACM Transactions on Software Engineering and Methodology (TOSEM). arXiv admin note: substantial text overlap with arXiv:2209.06116

With the widespread success of deep learning technologies, many trained deep neural network (DNN) models are now publicly available. However, directly reusing the public DNN models for new tasks often fails due to mismatching functionality or performance. Inspired by the notion of modularization and composition in software reuse, we investigate the possibility of improving the reusability of DNN models in a more fine-grained manner. Specifically, we propose two modularization approaches named CNNSplitter and GradSplitter, which can decompose a trained convolutional neural network (CNN) model for $N$-class classification into $N$ small reusable modules. Each module recognizes one of the $N$ classes and contains a part of the convolution kernels of the trained CNN model. Then, the resulting modules can be reused to patch existing CNN models or build new CNN models through composition. The main difference between CNNSplitter and GradSplitter lies in their search methods: the former relies on a genetic algorithm to explore search space, while the latter utilizes a gradient-based search method. Our experiments with three representative CNNs on three widely-used public datasets demonstrate the effectiveness of the proposed approaches. Compared with CNNSplitter, GradSplitter incurs less accuracy loss, produces much smaller modules (19.88% fewer kernels), and achieves better results on patching weak models. In particular, experiments on GradSplitter show that (1) by patching weak models, the average improvement in terms of precision, recall, and F1-score is 17.13%, 4.95%, and 11.47%, respectively, and (2) for a new task, compared with the models trained from scratch, reusing modules achieves similar accuracy (the average loss of accuracy is only 2.46%) without a costly training process. Our approaches provide a viable solution to the rapid development and improvement of CNN models.

翻译：随着深度学习技术的广泛成功，大量训练好的深度神经网络（DNN）模型现已公开可用。然而，直接将这些公开的DNN模型用于新任务时，常因功能或性能不匹配而失败。受软件重用中模块化与组合概念的启发，我们探究了以更细粒度方式提升DNN模型可重用性的可能性。具体而言，我们提出两种模块化方法——CNNSplitter和GradSplitter，它们可将已训练的用于N类分类的卷积神经网络（CNN）模型分解为N个可重用的小型模块。每个模块既能识别N个类别之一，又包含已训练CNN模型的部分卷积核。然后，这些所得模块可通过组合方式修补现有CNN模型或构建新CNN模型。CNNSplitter与GradSplitter的主要区别在于搜索方法：前者依赖遗传算法探索搜索空间，而后者采用基于梯度的搜索方法。我们在三个代表性CNN模型及三个广泛使用的公开数据集上的实验表明，所提方法具有有效性。相较于CNNSplitter，GradSplitter在精度损失更小的前提下，生成更小的模块（核数量减少19.88%），并在修补弱模型方面取得更优效果。特别地，GradSplitter实验显示：（1）通过修补弱模型，模型在精确率、召回率和F1分数上平均提升分别为17.13%、4.95%和11.47%；（2）对于新任务，相较于从零训练的模型，重用模块可达到相近精度（精度平均仅损失2.46%），且无需昂贵训练过程。我们的方法为CNN模型的快速开发与性能提升提供了可行方案。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日