As a fundamental operation in modern machine vision models, feature upsampling has been widely used and investigated in the literatures. An ideal upsampling operation should be lightweight, with low computational complexity. That is, it can not only improve the overall performance but also not affect the model complexity. Content-aware Reassembly of Features (CARAFE) is a well-designed learnable operation to achieve feature upsampling. Albeit encouraging performance achieved, this method requires generating large-scale kernels, which brings a mass of extra redundant parameters, and inherently has limited scalability. To this end, we propose a lightweight upsampling operation, termed Dynamic Lightweight Upsampling (DLU) in this paper. In particular, it first constructs a small-scale source kernel space, and then samples the large-scale kernels from the kernel space by introducing learnable guidance offsets, hence avoiding introducing a large collection of trainable parameters in upsampling. Experiments on several mainstream vision tasks show that our DLU achieves comparable and even better performance to the original CARAFE, but with much lower complexity, e.g., DLU requires 91% fewer parameters and at least 63% fewer FLOPs (Floating Point Operations) than CARAFE in the case of 16x upsampling, but outperforms the CARAFE by 0.3% mAP in object detection. Code is available at https://github.com/Fu0511/Dynamic-Lightweight-Upsampling.
翻译:作为现代机器视觉模型中的基本操作,特征上采样已在文献中得到广泛应用与研究。理想的上采样操作应具备轻量化特性,即计算复杂度低,既能提升整体性能,又不增加模型复杂度。特征内容感知重组(CARAFE)是一种精心设计的可学习上采样方法。尽管其性能表现优异,但该方法需要生成大规模核函数,引入了大量冗余参数,且可扩展性有限。为此,本文提出一种轻量级上采样操作——动态轻量上采样(DLU)。该方法首先构建小尺度源核空间,随后通过引入可学习的引导偏移量从核空间中采样大规模核函数,从而避免在上采样过程中引入大量可训练参数。在多个主流视觉任务上的实验表明,DLU在取得与原始CARAFE相当甚至更优性能的同时,显著降低了计算复杂度:例如在16倍上采样场景下,DLU所需参数量减少91%,浮点运算量至少降低63%,而在目标检测任务中平均精度(mAP)反超CARAFE 0.3%。代码已开源:https://github.com/Fu0511/Dynamic-Lightweight-Upsampling。