Generative models, e.g., Stable Diffusion, have enabled the creation of photorealistic images from text prompts. Yet, the generation of 360-degree panorama images from text remains a challenge, particularly due to the dearth of paired text-panorama data and the domain gap between panorama and perspective images. In this paper, we introduce a novel dual-branch diffusion model named PanFusion to generate a 360-degree image from a text prompt. We leverage the stable diffusion model as one branch to provide prior knowledge in natural image generation and register it to another panorama branch for holistic image generation. We propose a unique cross-attention mechanism with projection awareness to minimize distortion during the collaborative denoising process. Our experiments validate that PanFusion surpasses existing methods and, thanks to its dual-branch structure, can integrate additional constraints like room layout for customized panorama outputs. Code is available at https://chengzhag.github.io/publication/panfusion.
翻译:生成模型(如Stable Diffusion)已使根据文本提示生成逼真图像成为可能。然而,从文本生成360度全景图像仍具挑战性,主要源于配对文本-全景数据的匮乏以及全景图像与透视图像之间的领域差异。本文提出一种名为PanFusion的新型双分支扩散模型,用于根据文本提示生成360度图像。我们利用稳定扩散模型作为一个分支,提供自然图像生成的先验知识,并将其注册到另一个全景分支以实现整体图像生成。我们提出一种独特的带有投影感知的交叉注意力机制,以最小化协同去噪过程中的畸变。实验证明,PanFusion在性能上超越现有方法,且得益于其双分支结构,可整合房间布局等额外约束条件以实现定制化全景输出。代码开源地址:https://chengzhag.github.io/publication/panfusion