Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and model validation, and the Deploy mode is designed for the deployment of models. This paper focuses on the trade-off between stability and efficiency in ConvBN blocks: Deploy mode is efficient but suffers from training instability; Eval mode is widely used in transfer learning but lacks efficiency. To solve the dilemma, we theoretically reveal the reason behind the diminished training stability observed in the Deploy mode. Subsequently, we propose a novel Tune mode to bridge the gap between Eval mode and Deploy mode. The proposed Tune mode is as stable as Eval mode for transfer learning, and its computational efficiency closely matches that of the Deploy mode. Through extensive experiments in both object detection and classification tasks, carried out across various datasets and model architectures, we demonstrate that the proposed Tune mode does not hurt the original performance while significantly reducing GPU memory footprint and training time, thereby contributing an efficient solution to transfer learning with convolutional networks.
翻译:卷积批归一化(ConvBN)模块是计算机视觉及其他领域诸多任务中的核心组件。ConvBN模块可运行于三种模式:训练模式、评估模式和部署模式。训练模式对于从头训练模型不可或缺,评估模式适用于迁移学习和模型验证,而部署模式则专为模型部署设计。本文聚焦于ConvBN模块中稳定性与效率之间的权衡:部署模式高效但存在训练不稳定性;评估模式在迁移学习中被广泛使用但效率不足。为解决这一困境,我们从理论上揭示了部署模式下训练稳定性下降的根本原因。随后,我们提出了一种新颖的调优模式,以弥合评估模式与部署模式之间的差距。所提出的调优模式在迁移学习中具有与评估模式相当的稳定性,而其计算效率与部署模式高度接近。通过在目标检测与分类任务中,基于多种数据集和模型架构开展的大量实验,我们证明所提出的调优模式在保持原有性能的同时,显著降低了GPU显存占用和训练时间,从而为基于卷积网络的迁移学习提供了一种高效的解决方案。