State-of-the-art diffusion models can generate highly realistic images based on various conditioning like text, segmentation, and depth. However, an essential aspect often overlooked is the specific camera geometry used during image capture. The influence of different optical systems on the final scene appearance is frequently overlooked. This study introduces a framework that intimately integrates a text-to-image diffusion model with the particular lens geometry used in image rendering. Our method is based on a per-pixel coordinate conditioning method, enabling the control over the rendering geometry. Notably, we demonstrate the manipulation of curvature properties, achieving diverse visual effects, such as fish-eye, panoramic views, and spherical texturing using a single diffusion model.
翻译:当前最先进的扩散模型能够基于文本、分割图、深度图等多种条件生成高度逼真的图像。然而,图像采集过程中具体的相机几何特性这一关键要素常被忽视,不同光学系统对最终场景呈现的影响也鲜有探讨。本研究提出一种框架,将文本到图像的扩散模型与图像渲染中使用的特定镜头几何结构深度融合。该方法基于逐像素坐标条件机制,实现了对渲染几何属性的精确控制。值得注意的是,我们验证了通过单一扩散模型即可操控曲面曲率特性,从而生成鱼眼、全景视图及球形纹理等多种视觉效果。