Multi-scale features are essential for dense prediction tasks, such as object detection, instance segmentation, and semantic segmentation. The prevailing methods usually utilize a classification backbone to extract multi-scale features and then fuse these features using a lightweight module (e.g., the fusion module in FPN and BiFPN, two typical object detection methods). However, as these methods allocate most computational resources to the classification backbone, the multi-scale feature fusion in these methods is delayed, which may lead to inadequate feature fusion. While some methods perform feature fusion from early stages, they either fail to fully leverage high-level features to guide low-level feature learning or have complex structures, resulting in sub-optimal performance. We propose a streamlined cascade encoder-decoder network, dubbed CEDNet, tailored for dense \mbox{prediction} tasks. All stages in CEDNet share the same encoder-decoder structure and perform multi-scale feature fusion within the decoder. A hallmark of CEDNet is its ability to incorporate high-level features from early stages to guide low-level feature learning in subsequent stages, thereby enhancing the effectiveness of multi-scale feature fusion. We explored three well-known encoder-decoder structures: Hourglass, UNet, and FPN. When integrated into CEDNet, they performed much better than traditional methods that use a pre-designed classification backbone combined with a lightweight fusion module. Extensive experiments on object detection, instance segmentation, and semantic segmentation demonstrated the effectiveness of our method. The code is available at https://github.com/zhanggang001/CEDNet.
翻译:多尺度特征对于目标检测、实例分割和语义分割等密集预测任务至关重要。当前主流方法通常采用分类骨干网络提取多尺度特征,再通过轻量级模块(如FPN和BiFPN这两种典型目标检测方法中的融合模块)进行特征融合。然而,这类方法将大部分计算资源分配给分类骨干网络,导致多尺度特征融合环节滞后,可能造成特征融合不充分。虽然部分方法从早期阶段就进行特征融合,但要么未能充分利用高层特征指导低层特征学习,要么结构复杂导致性能次优。本文提出了一种精简的级联编码器-解码器网络CEDNet,专为密集预测任务设计。CEDNet的所有阶段共享相同的编码器-解码器结构,并在解码器内部完成多尺度特征融合。该网络的核心特性在于能从早期阶段引入高层特征,指导后续阶段的低层特征学习,从而提升多尺度特征融合效果。我们探索了三种经典的编码器-解码器结构:Hourglass、UNet和FPN。当集成到CEDNet中时,这些结构的性能远超传统采用预设计分类骨干网络与轻量级融合模块组合的方法。在目标检测、实例分割和语义分割上的大量实验验证了本方法的有效性。代码已开源至https://github.com/zhanggang001/CEDNet。