We present a novel method for reconstructing 3D objects from a single RGB image. Our method leverages the latest image generation models to infer the hidden 3D structure while remaining faithful to the input image. While existing methods obtain impressive results in generating 3D models from text prompts, they do not provide an easy approach for conditioning on input RGB data. Na\"ive extensions of these methods often lead to improper alignment in appearance between the input image and the 3D reconstructions. We address these challenges by introducing Image Constrained Radiance Fields (ConRad), a novel variant of neural radiance fields. ConRad is an efficient 3D representation that explicitly captures the appearance of an input image in one viewpoint. We propose a training algorithm that leverages the single RGB image in conjunction with pretrained Diffusion Models to optimize the parameters of a ConRad representation. Extensive experiments show that ConRad representations can simplify preservation of image details while producing a realistic 3D reconstruction. Compared to existing state-of-the-art baselines, we show that our 3D reconstructions remain more faithful to the input and produce more consistent 3D models while demonstrating significantly improved quantitative performance on a ShapeNet object benchmark.
翻译:我们提出了一种从单张RGB图像重建三维物体的新方法。该方法利用最新的图像生成模型推断隐藏的三维结构,同时保持对输入图像的忠实度。现有方法在从文本提示生成三维模型方面取得了显著成果,但未能提供便捷途径来基于输入RGB数据进行条件约束。对这些方法的朴素扩展常导致输入图像与三维重建结果在外观上出现不当偏差。为解决这些挑战,我们引入了图像约束神经辐射场(ConRad),这是神经辐射场的一种新型变体。ConRad是一种高效的三维表示方法,能够显式捕获输入图像在单一视角下的外观。我们提出了一种训练算法,通过结合单张RGB图像与预训练扩散模型来优化ConRad表示的参数。大量实验表明,ConRad表示能够在生成真实感三维重建的同时简化图像细节的保留。与现有最先进基线方法相比,我们的三维重建结果对输入图像具有更高的忠实度,生成更一致的三维模型,并在ShapeNet物体基准测试中展现出显著提升的量化性能。