Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods.
翻译:潜在扩散模型已被证明在视觉输出的生成与操控方面达到了最先进水平。然而,据我们所知,深度图与RGB图像的联合生成仍存在局限。我们提出了LDM3D-VR,一套面向虚拟现实开发的扩散模型套件,包含LDM3D-pano和LDM3D-SR。这些模型分别支持基于文本提示生成全景RGBD图像,以及将低分辨率输入升采样为高分辨率RGBD图像。我们的模型基于现有预训练模型,在包含全景/高分辨率RGB图像、深度图和文本描述的数据集上进行微调。与现有相关方法相比,我们对两个模型均进行了评估。