Deep Generative Models (DGMs) are widely used to create innovative designs across multiple industries, ranging from fashion to the automotive sector. In addition to generating images of high visual quality, the task of structural design generation imposes more stringent constrains on the semantic expression, e.g., no floating material or missing part, which we refer to as plausibility in this work. We delve into the impact of noise schedules of diffusion models on the plausibility of the outcome: there exists a range of noise levels at which the model's performance decides the result plausibility. Also, we propose two techniques to determine such a range for a given image set and devise a novel parametric noise schedule for better plausibility. We apply this noise schedule to the training and sampling of the well-known diffusion model EDM and compare it to its default noise schedule. Compared to EDM, our schedule significantly improves the rate of plausible designs from 83.4% to 93.5% and Fr\'echet Inception Distance (FID) from 7.84 to 4.87. Further applications of advanced image editing tools demonstrate the model's solid understanding of structure.
翻译:深度生成模型广泛用于跨多个行业(从时尚到汽车领域)的创新设计生成。除了生成高视觉质量的图像外,结构设计生成任务对语义表达施加了更严格的约束,例如无悬浮材料或缺失部件——本文称之为"合理性"。我们深入探讨了扩散模型的噪声调度对结果合理性的影响:存在一个噪声水平范围,在该范围内模型性能决定结果的合理性。同时,我们提出了两种方法来确定给定图像集的这一范围,并设计了一种新型参数化噪声调度以提高合理性。我们将此噪声调度应用于知名扩散模型EDM的训练和采样,并与默认噪声调度进行比较。与EDM相比,我们的调度显著将合理设计率从83.4%提升至93.5%,Fréchet初始距离(FID)从7.84降低至4.87。进一步应用先进图像编辑工具的结果表明,该模型对结构有着扎实的理解。