In contemporary design practices, the integration of computer vision and generative artificial intelligence (genAI) represents a transformative shift towards more interactive and inclusive processes. These technologies offer new dimensions of image analysis and generation, which are particularly relevant in the context of urban landscape reconstruction. This paper presents a novel workflow encapsulated within a prototype application, designed to leverage the synergies between advanced image segmentation and diffusion models for a comprehensive approach to urban design. Our methodology encompasses the OneFormer model for detailed image segmentation and the Stable Diffusion XL (SDXL) diffusion model, implemented through ControlNet, for generating images from textual descriptions. Validation results indicated a high degree of performance by the prototype application, showcasing significant accuracy in both object detection and text-to-image generation. This was evidenced by superior Intersection over Union (IoU) and CLIP scores across iterative evaluations for various categories of urban landscape features. Preliminary testing included utilising UrbanGenAI as an educational tool enhancing the learning experience in design pedagogy, and as a participatory instrument facilitating community-driven urban planning. Early results suggested that UrbanGenAI not only advances the technical frontiers of urban landscape reconstruction but also provides significant pedagogical and participatory planning benefits. The ongoing development of UrbanGenAI aims to further validate its effectiveness across broader contexts and integrate additional features such as real-time feedback mechanisms and 3D modelling capabilities. Keywords: generative AI; panoptic image segmentation; diffusion models; urban landscape design; design pedagogy; co-design
翻译:在当代设计实践中,计算机视觉与生成式人工智能(genAI)的融合代表着向更具交互性和包容性设计流程的变革性转变。这些技术为图像分析与生成提供了新的维度,尤其在城市景观重建领域具有重要意义。本文提出了一种整合于原型应用中的创新工作流,旨在利用先进图像分割技术与扩散模型之间的协同效应,实现城市设计的综合方法。我们的方法包括采用OneFormer模型进行精细图像分割,以及通过ControlNet实现的Stable Diffusion XL(SDXL)扩散模型,用于从文本描述生成图像。验证结果表明,该原型应用具有高性能,在目标检测和文本到图像生成方面均展现出显著准确性。通过对各类城市景观特征的迭代评估,其在交并比(IoU)和CLIP评分上均取得了优异结果。初步测试包括将UrbanGenAI用作增强设计教育学学习体验的教育工具,以及作为促进社区驱动型城市规划的参与式工具。早期结果表明,UrbanGenAI不仅推动了城市景观重建的技术前沿,还带来了显著的教学与参与式规划效益。UrbanGenAI的持续开发旨在进一步验证其在更广泛场景中的有效性,并整合实时反馈机制和三维建模能力等附加功能。关键词:生成式人工智能;全景图像分割;扩散模型;城市景观设计;设计教育学;协同设计