We introduce a novel 3D generative method, Generative 3D Reconstruction (G3DR) in ImageNet, capable of generating diverse and high-quality 3D objects from single images, addressing the limitations of existing methods. At the heart of our framework is a novel depth regularization technique that enables the generation of scenes with high-geometric fidelity. G3DR also leverages a pretrained language-vision model, such as CLIP, to enable reconstruction in novel views and improve the visual realism of generations. Additionally, G3DR designs a simple but effective sampling procedure to further improve the quality of generations. G3DR offers diverse and efficient 3D asset generation based on class or text conditioning. Despite its simplicity, G3DR is able to beat state-of-theart methods, improving over them by up to 22% in perceptual metrics and 90% in geometry scores, while needing only half of the training time. Code is available at https://github.com/preddy5/G3DR
翻译:我们提出了一种新颖的生成式三维方法,即ImageNet中的生成式三维重建(G3DR),该方法能够从单张图像生成多样且高质量的三维物体,解决了现有方法的局限性。我们框架的核心是一种新颖的深度正则化技术,该技术能够生成具有高几何保真度的场景。G3DR还利用预训练的语言-视觉模型(如CLIP)来实现新视角下的重建,并提升生成结果的视觉真实感。此外,G3DR设计了一种简单而有效的采样程序,以进一步提高生成质量。G3DR基于类别或文本条件,实现了多样且高效的三维资产生成。尽管方法简洁,G3DR仍能击败当前最先进的方法,在感知指标上提升高达22%,在几何评分上提升高达90%,同时训练时间仅需后者的一半。代码可在 https://github.com/preddy5/G3DR 获取。