While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle. To address this, we propose AnomalyPainter, a zero-shot framework that breaks the diversity-realism trade-off dilemma through synergizing Vision Language Large Model (VLLM), Latent Diffusion Model (LDM), and our newly introduced texture library Tex-9K. Tex-9K is a professional texture library containing 75 categories and 8,792 texture assets crafted for diverse anomaly synthesis. Leveraging VLLM's general knowledge, reasonable anomaly text descriptions are generated for each industrial object and matched with relevant diverse textures from Tex-9K. These textures then guide the LDM via ControlNet to paint on normal images. Furthermore, we introduce Texture-Aware Latent Init to stabilize the natural-image-trained ControlNet for industrial images. Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization, achieving superior downstream performance.
翻译:尽管现有异常合成方法已取得显著进展,但实现合成的真实性与多样性仍面临重大挑战。为此,我们提出AnomalyPainter——一种零样本框架,通过协同视觉语言大模型(VLLM)、潜在扩散模型(LDM)与我们新构建的纹理库Tex-9K,突破了多样性-真实性的权衡困境。Tex-9K是一个专业纹理库,包含75个类别共计8,792个纹理素材,专为多样化异常合成而构建。该方法利用VLLM的通用知识,为每个工业对象生成合理的异常文本描述,并从Tex-9K中匹配相关的多样化纹理。这些纹理随后通过ControlNet引导LDM在正常图像上进行绘制。此外,我们提出纹理感知潜在初始化技术,以稳定面向自然图像训练的ControlNet在工业图像上的生成效果。大量实验表明,AnomalyPainter在真实性、多样性和泛化性方面均优于现有方法,并取得了更优的下游任务性能。