DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model - 专知论文

会员服务 ·

0

NeRF · 3D · IB · 重建 · 3D模型 ·

2023 年 4 月 6 日

DITTO-NeRF: Diffusion-based Iterative Text To Omni-directional 3D Model

翻译：DITTO-NeRF：基于扩散的迭代文本到全方位3D模型

Hoigi Seo,Hayeon Kim,Gwanghyun Kim,Se Young Chun

from arxiv, Project page: https://janeyeon.github.io/ditto-nerf/

The increasing demand for high-quality 3D content creation has motivated the development of automated methods for creating 3D object models from a single image and/or from a text prompt. However, the reconstructed 3D objects using state-of-the-art image-to-3D methods still exhibit low correspondence to the given image and low multi-view consistency. Recent state-of-the-art text-to-3D methods are also limited, yielding 3D samples with low diversity per prompt with long synthesis time. To address these challenges, we propose DITTO-NeRF, a novel pipeline to generate a high-quality 3D NeRF model from a text prompt or a single image. Our DITTO-NeRF consists of constructing high-quality partial 3D object for limited in-boundary (IB) angles using the given or text-generated 2D image from the frontal view and then iteratively reconstructing the remaining 3D NeRF using inpainting latent diffusion model. We propose progressive 3D object reconstruction schemes in terms of scales (low to high resolution), angles (IB angles initially to outer-boundary (OB) later), and masks (object to background boundary) in our DITTO-NeRF so that high-quality information on IB can be propagated into OB. Our DITTO-NeRF outperforms state-of-the-art methods in terms of fidelity and diversity qualitatively and quantitatively with much faster training times than prior arts on image/text-to-3D such as DreamFusion, and NeuralLift-360.

翻译：高质量3D内容创作需求的日益增长推动了从单张图像和/或文本提示自动生成3D物体模型方法的发展。然而，当前最先进的图像转3D方法重建的3D物体与输入图像的对应性较差，且多视角一致性较低。近期最先进的文本转3D方法也存在局限，生成的3D样本多样性低且合成时间长。为解决这些挑战，我们提出DITTO-NeRF，一种从文本提示或单张图像生成高质量3D NeRF模型的新型流水线。我们的DITTO-NeRF包括：利用给定或文本生成的正面2D图像构建有限边界（IB）角度下的高质量部分3D物体，随后通过修补潜扩散模型迭代重建剩余的3D NeRF。我们提出了渐进式3D物体重建方案，涵盖尺度（从低分辨率到高分辨率）、角度（从初始IB角度到外边界（OB）角度）和掩码（从物体到背景边界），使得IB的高质量信息能传播到OB。我们的DITTO-NeRF在保真度和多样性方面定性和定量均优于现有方法，且训练速度远超DreamFusion、NeuralLift-360等图像/文本转3D前沿技术。

0

相关内容

NeRF

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

专知会员服务

16+阅读 · 2022年3月3日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

专知会员服务

99+阅读 · 2020年7月3日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【MIT】条件说唱歌词生成与去噪自动编码器，Conditional Rap Lyrics Generation with Denoising Autoencoders

【MIT】条件说唱歌词生成与去噪自动编码器，Conditional Rap Lyrics Generation with Denoising Autoencoders

专知会员服务

16+阅读 · 2020年4月8日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

专知会员服务

20+阅读 · 2019年11月24日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

新智元

1+阅读 · 2022年11月22日

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

新智元

0+阅读 · 2022年10月8日

SIGGRAPH Asia 2022 | 一句话生成高清360度场景及光照，可直接渲染数字资产

SIGGRAPH Asia 2022 | 一句话生成高清360度场景及光照，可直接渲染数字资产

机器之心

0+阅读 · 2022年10月5日

一句话生成3D模型，但只需2D数据训练｜谷歌&UC Berkeley

一句话生成3D模型，但只需2D数据训练｜谷歌&UC Berkeley

量子位

0+阅读 · 2022年10月4日

扩散模型在图像生成领域大火，风头超过GAN？

扩散模型在图像生成领域大火，风头超过GAN？

夕小瑶的卖萌屋

0+阅读 · 2022年6月7日

谷歌新作Imagen：用Transformer和扩散模型把"文字到图像生成"卷上天！

谷歌新作Imagen：用Transformer和扩散模型把"文字到图像生成"卷上天！

CVer

0+阅读 · 2022年5月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

三维场景中基于空间方向关系的混合索引结构研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于元数据语义的地理空间数据关联方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于动作概念的本体知识库及在文本处理上的应用

国家自然科学基金

7+阅读 · 2012年12月31日

具有分片有理等距面的自由曲面造型方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Voronoi图的动态虚拟场景可见性计算方法

国家自然科学基金

0+阅读 · 2010年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

位移细分曲面的建模和编辑方法

国家自然科学基金

0+阅读 · 2009年12月31日

基于MUAV平台的ARGIS扩展技术

国家自然科学基金

1+阅读 · 2009年12月31日

Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation

Arxiv

0+阅读 · 2023年5月23日

Text-guided 3D Human Generation from 2D Collections

Arxiv

0+阅读 · 2023年5月23日

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

Arxiv

1+阅读 · 2023年5月23日

ControlVideo: Training-free Controllable Text-to-Video Generation

Arxiv

0+阅读 · 2023年5月22日

HoloDiffusion: Training a 3D Diffusion Model using 2D Images

Arxiv

0+阅读 · 2023年5月21日

Conditional Generative Modeling is All You Need for Marked Temporal Point Processes

Arxiv

0+阅读 · 2023年5月21日

Watermarking Diffusion Model

Arxiv

0+阅读 · 2023年5月21日

MaGIC: Multi-modality Guided Image Completion

Arxiv

0+阅读 · 2023年5月19日

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

Arxiv

0+阅读 · 2023年5月19日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

VIP会员

文章信息

相关主题

最新内容

现代战争的隐蔽系统：伊朗战争十大启示

现代战争的隐蔽系统：伊朗战争十大启示

专知会员服务

0+阅读 · 今天3:58

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

专知会员服务

3+阅读 · 6月26日

GNN跨域综述：从消息传递到图基础模型

GNN跨域综述：从消息传递到图基础模型

专知会员服务

4+阅读 · 6月26日

无人机自主控制与人工智能：系统性综述

无人机自主控制与人工智能：系统性综述

专知会员服务

12+阅读 · 6月26日

巡飞弹与反无人机系统——现代战场的两大支柱

巡飞弹与反无人机系统——现代战场的两大支柱

专知会员服务

5+阅读 · 6月26日

《打造“黄金舰队”》57页报告

《打造“黄金舰队”》57页报告

专知会员服务

4+阅读 · 6月26日

《北约数字教官网络发展路径》128页报告

《北约数字教官网络发展路径》128页报告

专知会员服务

3+阅读 · 6月26日

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

ECCV 2026 | MIMFlow：MIM与归一化流统一图像生成

专知会员服务

7+阅读 · 6月25日

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

超越自回归边界：扩散模型、世界模型与SSM如何重塑代码智能

专知会员服务

6+阅读 · 6月25日

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

重塑决策优势：美军作战艺术与多域作战中联盟联合全域指挥控制（CJADC2）体系的融合

专知会员服务

10+阅读 · 6月25日

网状网络及其在军事领域的运用

网状网络及其在军事领域的运用

专知会员服务

8+阅读 · 6月25日

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

《意识即战场——全球安全体系中认知战的演进：乌克兰构建认知作战体系的展望》

专知会员服务

9+阅读 · 6月25日

无美国参与的欧洲战争方式（万字长文）

无美国参与的欧洲战争方式（万字长文）

专知会员服务

8+阅读 · 6月25日

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

重构“下一场战争”的制胜理论：超越兰彻斯特方程与现代系统

专知会员服务

10+阅读 · 6月25日

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

《国防工业中基于模型定义的实施：产品定义数字化转型的战略路径》90页

专知会员服务

9+阅读 · 6月25日

相关VIP内容

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

【CVPR 2022】单目3D语义场景完成框架，MonoScene: Monocular 3D Semantic Scene Completion

专知会员服务

16+阅读 · 2022年3月3日

【ICML2020】统一预训练伪掩码语言模型

【ICML2020】统一预训练伪掩码语言模型

专知会员服务

27+阅读 · 2020年7月23日

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

【KDD2020】图神经网络生成式预训练，GPT-GNN: Generative Pre-Training of Graph Neural Networks

专知会员服务

99+阅读 · 2020年7月3日

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

【三维物体和手部姿态估计】综述论文最新进展，Recent Advances in 3D Object and Hand Pose Estimation

专知会员服务

21+阅读 · 2020年6月13日

【MIT】条件说唱歌词生成与去噪自动编码器，Conditional Rap Lyrics Generation with Denoising Autoencoders

【MIT】条件说唱歌词生成与去噪自动编码器，Conditional Rap Lyrics Generation with Denoising Autoencoders

专知会员服务

16+阅读 · 2020年4月8日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

【DeepMind】PolyGen: 一种三维网格的自回归生成模型，PolyGen: An Autoregressive Generative Model of 3D Meshes

专知会员服务

37+阅读 · 2020年2月27日

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

【NLP| 推荐文章】从统一文本到文本探讨迁移学习的局限性（Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer）

专知会员服务

20+阅读 · 2019年11月24日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

ICML 2026 | 自回归Boltzmann生成器重塑分子采样

无人机自主控制与人工智能：系统性综述

现代战争的隐蔽系统：伊朗战争十大启示

GNN跨域综述：从消息传递到图基础模型

相关资讯

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

英伟达入局AIGC！Magic3D新模型力压谷歌DreamFusion

新智元

1+阅读 · 2022年11月22日

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

3D版DALL-E来了！谷歌发布文本3D生成模型DreamFusion，重点是zero-shot

新智元

0+阅读 · 2022年10月8日

SIGGRAPH Asia 2022 | 一句话生成高清360度场景及光照，可直接渲染数字资产

SIGGRAPH Asia 2022 | 一句话生成高清360度场景及光照，可直接渲染数字资产

机器之心

0+阅读 · 2022年10月5日

一句话生成3D模型，但只需2D数据训练｜谷歌&UC Berkeley

一句话生成3D模型，但只需2D数据训练｜谷歌&UC Berkeley

量子位

0+阅读 · 2022年10月4日

扩散模型在图像生成领域大火，风头超过GAN？

扩散模型在图像生成领域大火，风头超过GAN？

夕小瑶的卖萌屋

0+阅读 · 2022年6月7日

谷歌新作Imagen：用Transformer和扩散模型把"文字到图像生成"卷上天！

谷歌新作Imagen：用Transformer和扩散模型把"文字到图像生成"卷上天！

CVer

0+阅读 · 2022年5月27日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

Large Language Models are Frame-level Directors for Zero-shot Text-to-Video Generation

Arxiv

0+阅读 · 2023年5月23日

Text-guided 3D Human Generation from 2D Collections

Arxiv

0+阅读 · 2023年5月23日

Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models

Arxiv

1+阅读 · 2023年5月23日

ControlVideo: Training-free Controllable Text-to-Video Generation

Arxiv

0+阅读 · 2023年5月22日

HoloDiffusion: Training a 3D Diffusion Model using 2D Images

Arxiv

0+阅读 · 2023年5月21日

Conditional Generative Modeling is All You Need for Marked Temporal Point Processes

Arxiv

0+阅读 · 2023年5月21日

Watermarking Diffusion Model

Arxiv

0+阅读 · 2023年5月21日

MaGIC: Multi-modality Guided Image Completion

Arxiv

0+阅读 · 2023年5月19日

Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields

Arxiv

0+阅读 · 2023年5月19日

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Arxiv

43+阅读 · 2023年4月19日

相关基金

三维场景中基于空间方向关系的混合索引结构研究

国家自然科学基金

0+阅读 · 2015年12月31日

基于元数据语义的地理空间数据关联方法研究

国家自然科学基金

2+阅读 · 2012年12月31日

基于动作概念的本体知识库及在文本处理上的应用

国家自然科学基金

7+阅读 · 2012年12月31日

具有分片有理等距面的自由曲面造型方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

基于Voronoi图的动态虚拟场景可见性计算方法

国家自然科学基金

0+阅读 · 2010年12月31日

改进Max-SAT算法的关键技术研究

国家自然科学基金

0+阅读 · 2009年12月31日

位移细分曲面的建模和编辑方法

国家自然科学基金

0+阅读 · 2009年12月31日

基于MUAV平台的ARGIS扩展技术

国家自然科学基金

1+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员