Stable Diffusion model has been extensively employed in the study of archi-tectural image generation, but there is still an opportunity to enhance in terms of the controllability of the generated image content. A multi-network combined text-to-building facade image generating method is proposed in this work. We first fine-tuned the Stable Diffusion model on the CMP Fa-cades dataset using the LoRA (Low-Rank Adaptation) approach, then we ap-ply the ControlNet model to further control the output. Finally, we contrast-ed the facade generating outcomes under various architectural style text con-tents and control strategies. The results demonstrate that the LoRA training approach significantly decreases the possibility of fine-tuning the Stable Dif-fusion large model, and the addition of the ControlNet model increases the controllability of the creation of text to building facade images. This pro-vides a foundation for subsequent studies on the generation of architectural images.
翻译:Stable Diffusion模型已广泛应用于建筑图像生成研究,但在生成图像内容的可控性方面仍有提升空间。本文提出了一种多网络联合的文本到建筑立面图像生成方法。首先,我们采用LoRA(低秩适应)方法在CMP立面数据集上对Stable Diffusion模型进行微调,然后应用ControlNet模型进一步控制输出。最后,我们对比了不同建筑风格文本内容和控制策略下的立面生成结果。实验结果表明,LoRA训练方法显著降低了微调Stable Diffusion大模型的可能性,而ControlNet模型的加入增强了文本到建筑立面图像生成的可控性。这为后续建筑图像生成研究奠定了基础。