YOLO-AMC: An Improved YOLO Architecture with Attention Mechanisms for Building Crack Detection

Crack detection plays an important role in infrastructure inspection and Structural Health Monitoring (SHM). However, cracks typically appear as thin, low-contrast structures and are easily affected by background noise, posing challenges for existing object detection models. This study proposes an improved YOLO-based architecture with integrated attention mechanisms, termed YOLO-AMC (YOLO with Attention Mechanisms for Crack Detection), to enhance automated crack detection performance. Based on YOLOv11, the original C2PSA module is removed, and multiple attention mechanisms, including Global Attention Mechanism (GAM), Residual Convolutional Block Attention Module (Res-CBAM), and Shuffle Attention (SA), are introduced into the multi-scale feature fusion layers of the Neck to strengthen cross-scale feature integration. Experimental results demonstrate that YOLO-AMC consistently outperforms baseline models YOLOv11n and YOLOv8n across multiple evaluation metrics. Among the evaluated attention modules, GAM achieves the best detection performance, obtaining [email protected] = 0.9917 and [email protected]:0.95 = 0.9506 on the test dataset, which are higher than those of YOLOv11 (0.9833 / 0.9112) and YOLOv8 (0.9707 / 0.8921). Furthermore, while maintaining a computational complexity of 7.6 GFLOPs, the proposed model achieves 110.95 FPS on an NVIDIA RTX 4090 platform and approximately 5 FPS on a Raspberry Pi 5 edge device, demonstrating a favorable trade-off between accuracy and deployment efficiency. The implementation code for this study is available on GitHub at https://github.com/CY-Tsai24/YOLO-AMC.

翻译：裂缝检测在基础设施检查与结构健康监测（SHM）中具有重要作用。然而，裂缝通常表现为细长、低对比度的结构，且易受背景噪声影响，给现有目标检测模型带来了挑战。本研究提出了一种基于YOLO的改进型架构，集成多种注意力机制，命名为YOLO-AMC（用于裂缝检测的注意力增强型YOLO），以提升自动裂缝检测性能。在YOLOv11基础上，移除了原始C2PSA模块，并在颈部多尺度特征融合层中引入全局注意力机制（GAM）、残差卷积块注意力模块（Res-CBAM）和随机注意力（SA），以强化跨尺度特征整合。实验结果表明，YOLO-AMC在多项评估指标上均持续优于基准模型YOLOv11n和YOLOv8n。在评估的注意力模块中，GAM实现了最佳检测性能，在测试数据集上获得[email protected]=0.9917和[email protected]:0.95=0.9506，高于YOLOv11（0.9833/0.9112）和YOLOv8（0.9707/0.8921）。此外，在保持7.6 GFLOPs计算复杂度的同时，所提模型在NVIDIA RTX 4090平台上达到110.95 FPS，在树莓派5边缘设备上约为5 FPS，展现了精度与部署效率之间的良好平衡。本研究的实现代码已在GitHub上开源，地址为https://github.com/CY-Tsai24/YOLO-AMC。

相关内容

Yolo

关注 28

Yolo算法，其全称是You Only Look Once: Unified, Real-Time Object Detection,You Only Look Once说的是只需要一次CNN运算，Unified指的是这是一个统一的框架，提供end-to-end的预测，而Real-Time体现是Yolo算法速度快。

TPAMI 2025 | SIAMD：基于结构信息原理的主动机器人检测框架

专知会员服务

13+阅读 · 1月16日

TransMLA：多头潜在注意力（MLA）即为所需

专知会员服务

23+阅读 · 2025年2月13日

YOLOv1 到 YOLOv10：最快且最准确的实时目标检测系统

专知会员服务

42+阅读 · 2024年8月22日

《复杂断裂的计算机视觉定量分析》美陆军报告

专知会员服务

24+阅读 · 2023年7月4日