Recently, 3D generative models have made impressive progress, enabling the generation of almost arbitrary 3D assets from text or image inputs. However, these approaches generate objects in isolation without any consideration for the scene where they will eventually be placed. In this paper, we propose a framework that allows for the stylization of an existing 3D asset to fit into a given 2D scene, and additionally produce a photorealistic composition as if the asset was placed within the environment. This not only opens up a new level of control for object stylization, for example, the same assets can be stylized to reflect changes in the environment, such as summer to winter or fantasy versus futuristic settings-but also makes the object-scene composition more controllable. We achieve this by combining modeling and optimizing the object's texture and environmental lighting through differentiable ray tracing with image priors from pre-trained text-to-image diffusion models. We demonstrate that our method is applicable to a wide variety of indoor and outdoor scenes and arbitrary objects.
翻译:近来,3D生成模型取得了显著进展,能够从文本或图像输入中生成几乎任意形状的3D资产。然而,这些方法在生成物体时孤立地进行,并未考虑这些物体最终将被放置的场景。本文提出了一种框架,允许对现有3D资产进行风格化处理,使其融入给定的2D场景,并生成照片级逼真的合成效果,仿佛该物体被放置于该环境中。这不仅为物体风格化提供了新的控制维度——例如,同一资产可根据环境变化(如夏季到冬季、奇幻与科幻设定)进行风格化调整——还使物体-场景合成更加可控。我们通过结合可微光线追踪建模与优化物体纹理及环境光照,并利用预训练的文本到图像扩散模型的图像先验来实现这一目标。实验表明,我们的方法适用于广泛的室内外场景及任意物体。