Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection. However, the currently existing models generally focus on the Encoder-side Transformer to extract features, from which decoder improvement can bring further potential with well-designed architecture. We propose CFPFormer, a novel decoder block that integrates feature pyramids and transformers. Specifically, by leveraging patch embedding, cross-layer feature concatenation, and Gaussian attention mechanisms, CFPFormer enhances feature extraction capabilities while promoting generalization across diverse tasks. Benefiting from Transformer structure and U-shaped Connections, our introduced model gains the ability to capture long-range dependencies and effectively up-sample feature maps. Our model achieves superior performance in detecting small objects compared to existing methods. We evaluate CFPFormer on medical image segmentation datasets and object detection benchmarks (VOC 2007, VOC2012, MS-COCO), demonstrating its effectiveness and versatility. On the ACDC Post-2017-MICCAI-Challenge online test set, our model reaches exceptionally impressive accuracy, and performed well compared with the original decoder setting in Synapse multi-organ segmentation dataset.
翻译:特征金字塔已被广泛应用于卷积神经网络和Transformer中,以完成医学图像分割和目标检测等任务。然而,现有模型通常侧重于编码器端Transformer进行特征提取,而通过精心设计的架构改进解码器可进一步挖掘潜力。本文提出CFPFormer——一种集成特征金字塔与Transformer的新型解码器模块。具体而言,通过利用分块嵌入、跨层特征拼接和高斯注意力机制,CFPFormer在增强特征提取能力的同时,提升了跨不同任务的泛化性能。得益于Transformer结构与U形连接,该模型具备捕获长距离依赖关系并有效上采样特征图的能力。与现有方法相比,我们的模型在小目标检测方面取得了更优性能。我们在医学图像分割数据集和目标检测基准(VOC 2007、VOC2012、MS-COCO)上评估了CFPFormer,验证了其有效性与通用性。在ACDC Post-2017-MICCAI-Challenge在线测试集上,该模型达到了极佳的准确率,并在Synapse多器官分割数据集中相比原始解码器设置表现优异。