Cracks pose safety risks to infrastructure and cannot be overlooked. The prevailing structures in existing crack segmentation networks predominantly consist of CNNs or Transformers. However, CNNs exhibit a deficiency in global modeling capability, hindering the representation to entire crack features. Transformers can capture long-range dependencies but suffer from high and quadratic complexity. Recently, Mamba has garnered extensive attention due to its linear spatial and computational complexity and its powerful global perception. This study explores the representation capabilities of Mamba to crack features. Specifically, this paper uncovers the connection between Mamba and the attention mechanism, providing a profound insight, an attention perspective, into interpreting Mamba and devising a novel Mamba module following the principles of attention blocks, namely CrackMamba. We compare CrackMamba with the most prominent visual Mamba modules, Vim and Vmamba, on two datasets comprising asphalt pavement and concrete pavement cracks, and steel cracks, respectively. The quantitative results show that CrackMamba stands out as the sole Mamba block consistently enhancing the baseline model's performance across all evaluation measures, while reducing its parameters and computational costs. Moreover, this paper substantiates that Mamba can achieve global receptive fields through both theoretical analysis and visual interpretability. The discoveries of this study offer a dual contribution. First, as a plug-and-play and simple yet effective Mamba module, CrackMamba exhibits immense potential for integration into various crack segmentation models. Second, the proposed innovative Mamba design concept, integrating Mamba with the attention mechanism, holds significant reference value for all Mamba-based computer vision models, not limited to crack segmentation networks, as investigated in this study.
翻译:裂缝对基础设施构成安全风险,不容忽视。现有裂缝分割网络的主流结构主要由CNN或Transformer构成。然而,CNN在全局建模能力上存在不足,阻碍了对完整裂缝特征的表征。Transformer能够捕获长程依赖关系,但面临高计算量和二次复杂度的问题。近期,Mamba因其线性的空间与计算复杂度以及强大的全局感知能力而受到广泛关注。本研究探讨了Mamba对裂缝特征的表征能力。具体而言,本文揭示了Mamba与注意力机制之间的联系,为解读Mamba提供了一个深刻的洞见——即注意力视角,并遵循注意力块的设计原则,提出了一种新颖的Mamba模块,命名为CrackMamba。我们在两个数据集上(分别包含沥青路面与混凝土路面裂缝,以及钢材裂缝)将CrackMamba与当前最主流的视觉Mamba模块Vim和Vmamba进行了比较。定量结果表明,CrackMamba是唯一一个在所有评估指标上均能持续提升基线模型性能,同时减少其参数量和计算成本的Mamba模块。此外,本文通过理论分析和视觉可解释性证实了Mamba能够实现全局感受野。本研究的发现具有双重贡献。首先,作为一个即插即用、简洁而有效的Mamba模块,CrackMamba展现出集成到各种裂缝分割模型中的巨大潜力。其次,所提出的创新性Mamba设计理念——将Mamba与注意力机制相结合,对所有基于Mamba的计算机视觉模型(不限于本文研究的裂缝分割网络)都具有重要的参考价值。