While existing anomaly synthesis methods have made remarkable progress, achieving both realism and diversity in synthesis remains a major obstacle. To address this, we propose AnomalyPainter, a zero-shot framework that breaks the diversity-realism trade-off dilemma through synergizing Vision Language Large Model (VLLM), Latent Diffusion Model (LDM), and our newly introduced texture library Tex-9K. Tex-9K is a professional texture library containing 75 categories and 8,792 texture assets crafted for diverse anomaly synthesis. Leveraging VLLM's general knowledge, reasonable anomaly text descriptions are generated for each industrial object and matched with relevant diverse textures from Tex-9K. These textures then guide the LDM via ControlNet to paint on normal images. Furthermore, we introduce Texture-Aware Latent Init to stabilize the natural-image-trained ControlNet for industrial images. Extensive experiments show that AnomalyPainter outperforms existing methods in realism, diversity, and generalization, achieving superior downstream performance.
翻译:尽管现有的异常合成方法已取得显著进展,但实现合成的真实性与多样性仍是一个主要障碍。为此,我们提出AnomalyPainter,一个零样本框架,通过协同视觉语言大模型、潜在扩散模型以及我们新引入的纹理库Tex-9K,打破了多样性-真实性的权衡困境。Tex-9K是一个专业的纹理库,包含75个类别和8,792个纹理资产,专为多样化的异常合成而构建。利用VLLM的通用知识,为每个工业对象生成合理的异常文本描述,并从Tex-9K中匹配相关的多样化纹理。这些纹理随后通过ControlNet引导LDM在正常图像上进行绘制。此外,我们引入了纹理感知潜在初始化方法,以稳定针对自然图像训练的ControlNet在工业图像上的应用。大量实验表明,AnomalyPainter在真实性、多样性和泛化能力上均优于现有方法,并实现了优异的下游性能。