Viewset Diffusion: (0-)Image-Conditioned 3D Generative Models from 2D Data

We present Viewset Diffusion: a framework for training image-conditioned 3D generative models from 2D data. Image-conditioned 3D generative models allow us to address the inherent ambiguity in single-view 3D reconstruction. Given one image of an object, there is often more than one possible 3D volume that matches the input image, because a single image never captures all sides of an object. Deterministic models are inherently limited to producing one possible reconstruction and therefore make mistakes in ambiguous settings. Modelling distributions of 3D shapes is challenging because 3D ground truth data is often not available. We propose to solve the issue of data availability by training a diffusion model which jointly denoises a multi-view image set.We constrain the output of Viewset Diffusion models to a single 3D volume per image set, guaranteeing consistent geometry. Training is done through reconstruction losses on renderings, allowing training with only three images per object. Our design of architecture and training scheme allows our model to perform 3D generation and generative, ambiguity-aware single-view reconstruction in a feed-forward manner. Project page: szymanowiczs.github.io/viewset-diffusion.

翻译：我们提出视图集扩散（Viewset Diffusion）：一种从二维数据训练图像条件三维生成模型的框架。图像条件三维生成模型能够解决单视图三维重建中固有的歧义性问题。由于单个图像无法捕捉物体的所有侧面，给定物体的一张图像时，往往存在多个与该输入图像匹配的三维体积。确定性模型本质上只能生成一种可能的重建结果，因此在歧义场景中会出现错误。三维形状分布的建模具有挑战性，因为三维真实数据通常难以获取。我们通过训练一个联合去噪多视图图像集的扩散模型来解决数据可用性问题。我们约束视图集扩散模型的输出为每个图像集生成唯一的三维体积，从而保证几何一致性。训练过程通过渲染的重建损失进行，仅需每个物体三张图像即可完成训练。我们的架构与训练方案设计使模型能够以前馈方式执行三维生成以及具有歧义感知能力的生成式单视图重建。项目页面：szymanowiczs.github.io/viewset-diffusion。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日