CCD-3DR: Consistent Conditioning in Diffusion for Single-Image 3D Reconstruction

In this paper, we present a novel shape reconstruction method leveraging diffusion model to generate 3D sparse point cloud for the object captured in a single RGB image. Recent methods typically leverage global embedding or local projection-based features as the condition to guide the diffusion model. However, such strategies fail to consistently align the denoised point cloud with the given image, leading to unstable conditioning and inferior performance. In this paper, we present CCD-3DR, which exploits a novel centered diffusion probabilistic model for consistent local feature conditioning. We constrain the noise and sampled point cloud from the diffusion model into a subspace where the point cloud center remains unchanged during the forward diffusion process and reverse process. The stable point cloud center further serves as an anchor to align each point with its corresponding local projection-based features. Extensive experiments on synthetic benchmark ShapeNet-R2N2 demonstrate that CCD-3DR outperforms all competitors by a large margin, with over 40% improvement. We also provide results on real-world dataset Pix3D to thoroughly demonstrate the potential of CCD-3DR in real-world applications. Codes will be released soon

翻译：本文提出了一种新颖的形状重建方法，利用扩散模型为单张RGB图像中的目标对象生成三维稀疏点云。现有方法通常采用全局嵌入或基于局部投影的特征作为条件来引导扩散模型，然而此类策略无法使去噪点云与输入图像保持一致性对齐，导致条件约束不稳定且性能欠佳。本文提出的CCD-3DR方法基于一种新型中心化扩散概率模型，实现了局部特征的一致条件约束。我们将扩散模型中的噪声和采样点云约束于一个子空间内，使得在前向扩散过程与反向过程中点云中心保持不变。稳定的点云中心可作为锚点，将每个点与其对应的局部投影特征对齐。在合成基准数据集ShapeNet-R2N2上的大量实验表明，CCD-3DR以超过40%的性能提升大幅度超越所有现有方法。我们还在真实世界数据集Pix3D上进行了实验，充分验证了CCD-3DR在真实场景中的应用潜力。代码将很快开源。

相关内容

点云

关注 50

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日