Building facade defect inspection is fundamental to structural health monitoring and sustainable urban maintenance, yet it remains a formidable challenge due to extreme geometric variability, low contrast against complex backgrounds, and the inherent complexity of composite defects (e.g., cracks co-occurring with spalling). Such characteristics lead to severe pixel imbalance and feature ambiguity, which, coupled with the critical scarcity of high-quality pixel-level annotations, hinder the generalization of existing detection and segmentation models. To address gaps, we propose \textit{FacadeFixer}, a unified multi-agent framework that treats defect perception as a collaborative reasoning task rather than isolated recognition. Specifically,\textit{FacadeFixer} orchestrates specialized agents for detection and segmentation to handle multi-type defect interference, working in tandem with a generative agent to enable semantic recomposition. This process decouples intricate defects from noisy backgrounds and realistically synthesizes them onto diverse clean textures, generating high-fidelity augmented data with precise expert-level masks. To support this, we introduce a comprehensive multi-task dataset covering six primary facade categories with pixel-level annotations. Extensive experiments demonstrate that \textit{FacadeFixer} significantly outperforms state-of-the-art (SOTA) baselines. Specifically, it excels in capturing pixel-level structural anomalies and highlights generative synthesis as a robust solution to data scarcity in infrastructure inspection. Our code and dataset will be made publicly available.
翻译:建筑立面缺陷检测是结构健康监测和可持续城市维护的基础任务,然而由于几何形态极端多样、与复杂背景对比度低以及复合型缺陷(如裂缝与剥落共存)的内在复杂性,该任务仍面临严峻挑战。此类特性导致严重的像素级不平衡和特征模糊性,加之高质量像素级标注数据的极度匮乏,严重制约了现有检测与分割模型的泛化能力。针对上述问题,我们提出\textit{FacadeFixer}——一种统一的多智能体框架,将缺陷感知视为协同推理任务而非孤立识别。具体而言,\textit{FacadeFixer}编排专用于检测与分割的智能体以处理多类型缺陷干扰,并与生成式智能体协同实现语义重组。该过程将复杂缺陷从噪声背景中解耦,并真实合成至多样化洁净纹理上,生成具有专家级精确掩膜的高保真增强数据。为支撑该框架,我们引入包含六大立面类别像素级标注的综合多任务数据集。大量实验表明,\textit{FacadeFixer}显著优于现有最优(SOTA)基线方法。特别地,该框架在捕获像素级结构异常方面表现卓越,并凸显了生成式综合作为基础设施检测中数据稀缺问题的稳健解决方案。我们的代码与数据集将公开发布。