Multi-scale features are essential for dense prediction tasks, including object detection, instance segmentation, and semantic segmentation. Existing state-of-the-art methods usually first extract multi-scale features by a classification backbone and then fuse these features by a lightweight module (e.g. the fusion module in FPN). However, we argue that it may not be sufficient to fuse the multi-scale features through such a paradigm, because the parameters allocated for feature fusion are limited compared with the heavy classification backbone. In order to address this issue, we propose a new architecture named Cascade Fusion Network (CFNet) for dense prediction. Besides the stem and several blocks used to extract initial high-resolution features, we introduce several cascaded stages to generate multi-scale features in CFNet. Each stage includes a sub-backbone for feature extraction and an extremely lightweight transition block for feature integration. This design makes it possible to fuse features more deeply and effectively with a large proportion of parameters of the whole backbone. Extensive experiments on object detection, instance segmentation, and semantic segmentation validated the effectiveness of the proposed CFNet. Codes will be available at https://github.com/zhanggang001/CFNet.
翻译:多尺度特征对于密集预测任务(包括目标检测、实例分割和语义分割)至关重要。现有最先进方法通常先通过分类主干网络提取多尺度特征,再利用轻量级模块(如FPN中的融合模块)融合这些特征。然而,我们认为,通过这种范式融合多尺度特征可能不足以解决问题,因为与庞大的分类主干网络相比,分配给特征融合的参数有限。为解决该问题,我们提出一种名为级联融合网络(CFNet)的新型架构用于密集预测。除用于提取初始高分辨率特征的stem和若干模块外,我们在CFNet中引入多个级联阶段以生成多尺度特征。每个阶段包含一个用于特征提取的子主干网络和一个用于特征集成的超轻量级过渡模块。该设计使得利用整个主干网络大部分参数进行更深层、更有效的特征融合成为可能。在目标检测、实例分割和语义分割上的大量实验验证了所提CFNet的有效性。代码将发布于https://github.com/zhanggang001/CFNet。