State-of-the-art diffusion models can generate highly realistic images based on various conditioning like text, segmentation, and depth. However, an essential aspect often overlooked is the specific camera geometry used during image capture. The influence of different optical systems on the final scene appearance is frequently overlooked. This study introduces a framework that intimately integrates a text-to-image diffusion model with the particular lens geometry used in image rendering. Our method is based on a per-pixel coordinate conditioning method, enabling the control over the rendering geometry. Notably, we demonstrate the manipulation of curvature properties, achieving diverse visual effects, such as fish-eye, panoramic views, and spherical texturing using a single diffusion model.
翻译:最先进的扩散模型能够基于文本、分割和深度等多种条件生成高度逼真的图像。然而,一个常被忽视的关键方面是图像采集过程中所使用的特定相机几何结构。不同光学系统对最终场景外观的影响往往未被充分考虑。本研究提出了一种框架,将文本到图像的扩散模型与图像渲染中使用的特定镜头几何结构紧密集成。我们的方法基于一种逐像素坐标条件调节技术,从而能够控制渲染几何。值得注意的是,我们展示了通过单一扩散模型对曲率属性的操控,实现了鱼眼、全景视图和球形纹理等多种视觉效果。