Defect segmentation is central to computer vision based inspection of infrastructure assets during both construction and operation. However, deployment remains limited due to scarce pixel-level labels and domain shift across environments. We introduce CrackSegFlow, a controllable Flow Matching synthesis method that renders synthetic images of cracks from masks with pixel-level alignment. Our renderer combines topology-preserving mask injection with edge gating to maintain thin-structure continuity. Class-conditional FM samples masks for topology diversity, and CrackSegFlow renders aligned ground truth images from them. We further inject cracks onto crack-free backgrounds to diversify confounders and reduce false positives. Across five datasets and using a CNN-Transformer backbone, our results demonstrate that adding synthesized pairs improves in-domain performance by +5.37 mIoU and +5.13 F1, while target-guided cross-domain synthesis driven by target mask statistics adds +13.12 mIoU and +14.82 F1. We also release CSF-50K, a benchmark dataset comprising 50,000 image-mask pairs.
翻译:缺陷分割是基于计算机视觉的基础设施资产在建造和运营期间检测的核心任务。然而,由于像素级标注稀缺以及跨环境领域偏移,其实际部署仍受限制。本文提出CrackSegFlow,一种可控的流匹配合成方法,能够从掩码中渲染出具有像素级对齐的合成裂缝图像。我们的渲染器结合了保持拓扑结构的掩码注入与边缘门控机制,以维持薄壁结构的连续性。类别条件流匹配采样掩码以提供拓扑多样性,CrackSegFlow则据此渲染出对齐的真实标注图像。我们进一步将裂缝注入无裂缝背景中,以增加混淆因素的多样性并降低误报率。在五个数据集上使用CNN-Transformer骨干网络的实验结果表明,添加合成图像对可使域内性能提升+5.37 mIoU和+5.13 F1,而通过目标掩码统计信息驱动的目标导向跨域合成则可提升+13.12 mIoU和+14.82 F1。我们还发布了CSF-50K基准数据集,包含50,000个图像-掩码对。