In recent years, polyp segmentation has gained significant importance, and many methods have been developed using CNN, Vision Transformer, and Transformer techniques to achieve competitive results. However, these methods often face difficulties when dealing with out-of-distribution datasets, missing boundaries, and small polyps. In 2022, Meta-Former was introduced as a new baseline for vision, which not only improved the performance of multi-task computer vision but also addressed the limitations of the Vision Transformer and CNN family backbones. To further enhance segmentation, we propose a fusion of Meta-Former with UNet, along with the introduction of a Multi-scale Upsampling block with a level-up combination in the decoder stage to enhance the texture, also we propose the Convformer block base on the idea of the Meta-former to enhance the crucial information of the local feature. These blocks enable the combination of global information, such as the overall shape of the polyp, with local information and boundary information, which is crucial for the decision of the medical segmentation. Our proposed approach achieved competitive performance and obtained the top result in the State of the Art on the CVC-300 dataset, Kvasir, and CVC-ColonDB dataset. Apart from Kvasir-SEG, others are out-of-distribution datasets. The implementation can be found at: https://github.com/huyquoctrinh/MetaPolyp-CBMS2023.
翻译:近年来,息肉分割的重要性日益凸显,基于CNN、Vision Transformer和Transformer技术的多种方法已取得竞争性结果。然而,这些方法在处理非分布数据集、边缘缺失和小型息肉时仍面临困难。2022年,Meta-Former被提出作为视觉领域的新基线,不仅提升了多任务计算机视觉的性能,还弥补了Vision Transformer和CNN系列骨干网络的局限性。为进一步提升分割效果,我们提出将Meta-Former与UNet融合,并在解码阶段引入带有层级提升组合的多尺度上采样模块以增强纹理信息;同时,基于Meta-Former思想提出Convformer模块,以增强局部特征的关键信息。这些模块能够融合全局信息(如息肉的整体形状)、局部信息与边界信息,这对医学分割的决策至关重要。我们的方法取得了竞争性性能,并在CVC-300数据集、Kvasir和CVC-ColonDB数据集上达到了当前最优结果。除Kvasir-SEG外,其余均为非分布数据集。相关实现可查阅:https://github.com/huyquoctrinh/MetaPolyp-CBMS2023。