Synthetic dataset generation in Computer Vision, particularly for industrial applications, is still underexplored. Industrial defect segmentation, for instance, requires highly accurate labels, yet acquiring such data is costly and time-consuming. To address this challenge, we propose a novel diffusion-based pipeline for generating high-fidelity industrial datasets with minimal supervision. Our approach conditions the diffusion model on enriched bounding box representations to produce precise segmentation masks, ensuring realistic and accurately localized defect synthesis. Compared to existing layout-conditioned generative methods, our approach improves defect consistency and spatial accuracy. We introduce two quantitative metrics to evaluate the effectiveness of our method and assess its impact on a downstream segmentation task trained on real and synthetic data. Our results demonstrate that diffusion-based synthesis can bridge the gap between artificial and real-world industrial data, fostering more reliable and cost-efficient segmentation models. The code is publicly available at https://github.com/covisionlab/diffusion_labeling.
翻译:计算机视觉中的合成数据集生成,尤其在工业应用领域,仍处于探索不足的状态。以工业缺陷分割为例,其需要高度精确的标签,但获取此类数据成本高昂且耗时。为应对这一挑战,我们提出了一种新颖的基于扩散的流程,用于在最小监督下生成高保真度的工业数据集。我们的方法通过丰富的边界框表示对扩散模型进行条件化,以生成精确的分割掩码,确保实现真实且定位准确的缺陷合成。与现有的布局条件生成方法相比,我们的方法提升了缺陷一致性与空间准确性。我们引入了两个量化指标来评估方法的有效性,并评估其在基于真实与合成数据训练的下游分割任务中的影响。我们的结果表明,基于扩散的合成能够弥合人工与真实世界工业数据之间的差距,从而促进更可靠且成本效益更高的分割模型。代码公开于 https://github.com/covisionlab/diffusion_labeling。