Target Aware Network Architecture Search and Compression for Efficient Knowledge Transfer

Transfer Learning enables Convolutional Neural Networks (CNN) to acquire knowledge from a source domain and transfer it to a target domain, where collecting large-scale annotated examples is time-consuming and expensive. Conventionally, while transferring the knowledge learned from one task to another task, the deeper layers of a pre-trained CNN are finetuned over the target dataset. However, these layers are originally designed for the source task which may be over-parameterized for the target task. Thus, finetuning these layers over the target dataset may affect the generalization ability of the CNN due to high network complexity. To tackle this problem, we propose a two-stage framework called TASCNet which enables efficient knowledge transfer. In the first stage, the configuration of the deeper layers is learned automatically and finetuned over the target dataset. Later, in the second stage, the redundant filters are pruned from the fine-tuned CNN to decrease the network's complexity for the target task while preserving the performance. This two-stage mechanism finds a compact version of the pre-trained CNN with optimal structure (number of filters in a convolutional layer, number of neurons in a dense layer, and so on) from the hypothesis space. The efficacy of the proposed method is evaluated using VGG-16, ResNet-50, and DenseNet-121 on CalTech-101, CalTech-256, and Stanford Dogs datasets. Similar to computer vision tasks, we have also conducted experiments on Movie Review Sentiment Analysis task. The proposed TASCNet reduces the computational complexity of pre-trained CNNs over the target task by reducing both trainable parameters and FLOPs which enables resource-efficient knowledge transfer. The source code is available at: https://github.com/Debapriya-Tula/TASCNet.

翻译：迁移学习使卷积神经网络（CNN）能够从源域获取知识并将其迁移到目标域，而在目标域中收集大规模标注样本既耗时又昂贵。传统上，在将一个任务学到的知识迁移到另一个任务时，预训练CNN的深层网络会在目标数据集上微调。然而，这些层最初是为源任务设计的，可能对目标任务而言过于参数化。因此，在目标数据集上微调这些层可能会因网络复杂度高而影响CNN的泛化能力。为解决这一问题，我们提出一个名为TASCNet的两阶段框架，该框架可实现高效的知识迁移。在第一阶段，深层网络的配置被自动学习并在目标数据集上进行微调。随后在第二阶段，从微调后的CNN中修剪冗余过滤器，以降低网络对目标任务的复杂度，同时保持性能。这种两阶段机制从假设空间中找到了预训练CNN的紧凑版本，其结构（卷积层中的过滤器数量、密集层中的神经元数量等）达到最优。所提出方法的有效性在CalTech-101、CalTech-256和Stanford Dogs数据集上使用VGG-16、ResNet-50和DenseNet-121进行了评估。与计算机视觉任务类似，我们还在电影评论情感分析任务上进行了实验。所提出的TASCNet通过减少可训练参数和浮点运算次数（FLOPs），降低了预训练CNN在目标任务上的计算复杂度，从而实现了资源高效的知识迁移。源代码可在以下网址获取：https://github.com/Debapriya-Tula/TASCNet。