CNNs, most notably the UNet, are the default architecture for biomedical segmentation. Transformer-based approaches, such as UNETR, have been proposed to replace them, benefiting from a global field of view, but suffering from larger runtimes and higher parameter counts. The recent Vision Mamba architecture offers a compelling alternative to transformers, also providing a global field of view, but at higher efficiency. Here, we introduce ViM-UNet, a novel segmentation architecture based on it and compare it to UNet and UNETR for two challenging microscopy instance segmentation tasks. We find that it performs similarly or better than UNet, depending on the task, and outperforms UNETR while being more efficient. Our code is open source and documented at https://github.com/constantinpape/torch-em/blob/main/vimunet.md.
翻译:CNN,尤其是UNet,是生物医学分割的默认架构。基于Transformer的方法(如UNETR)曾被提出以替代CNN,其优势在于全局视野,但存在运行时间更长、参数量更大的问题。近期提出的Vision Mamba架构为Transformer提供了极具吸引力的替代方案,同样具备全局视野,但效率更高。本文介绍了基于该架构的新型分割模型ViM-UNet,并将其与UNet和UNETR在两项具有挑战性的显微图像实例分割任务中进行比较。我们发现,根据任务不同,ViM-UNet的性能与UNet相当或更优,且优于UNETR,同时效率更高。我们的代码已开源,详见https://github.com/constantinpape/torch-em/blob/main/vimunet.md。