Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train, Eval, and Deploy. While the Train mode is indispensable for training models from scratch, the Eval mode is suitable for transfer learning and beyond, and the Deploy mode is designed for the deployment of models. This paper focuses on the trade-off between stability and efficiency in ConvBN blocks: Deploy mode is efficient but suffers from training instability; Eval mode is widely used in transfer learning but lacks efficiency. To solve the dilemma, we theoretically reveal the reason behind the diminished training stability observed in the Deploy mode. Subsequently, we propose a novel Tune mode to bridge the gap between Eval mode and Deploy mode. The proposed Tune mode is as stable as Eval mode for transfer learning, and its computational efficiency closely matches that of the Deploy mode. Through extensive experiments in object detection, classification, and adversarial example generation across $5$ datasets and $12$ model architectures, we demonstrate that the proposed Tune mode retains the performance while significantly reducing GPU memory footprint and training time, thereby contributing efficient ConvBN blocks for transfer learning and beyond. Our method has been integrated into both PyTorch (general machine learning framework) and MMCV/MMEngine (computer vision framework). Practitioners just need one line of code to enjoy our efficient ConvBN blocks thanks to PyTorch's builtin machine learning compilers.
翻译:卷积-批归一化(ConvBN)模块是众多计算机视觉任务及其他领域中的关键组成部分。ConvBN模块可运行于三种模式:训练模式、评估模式和部署模式。其中训练模式对于从头训练模型不可或缺,评估模式适用于迁移学习及更广泛场景,而部署模式则专为模型部署设计。本文聚焦于ConvBN模块中稳定性与效率的权衡问题:部署模式虽高效但存在训练不稳定性;评估模式虽被广泛用于迁移学习却缺乏效率。为解决这一矛盾,我们从理论上揭示了部署模式下训练稳定性下降的根本原因,进而提出了一种新颖的调优模式,以弥合评估模式与部署模式之间的差距。所提出的调优模式在迁移学习中与评估模式具有同等稳定性,而其计算效率则与部署模式高度接近。通过在$5$个数据集和$12$种模型架构上开展的目标检测、分类及对抗样本生成等大量实验,我们证明了所提调优模式在保持性能的同时显著降低了GPU内存占用和训练时间,从而为迁移学习及更广泛场景提供了高效的ConvBN模块。该方法已集成至PyTorch(通用机器学习框架)和MMCV/MMEngine(计算机视觉框架)中。借助PyTorch内置的机器学习编译器,开发者仅需一行代码即可使用我们高效的ConvBN模块。