We present Viewset Diffusion: a framework for training image-conditioned 3D generative models from 2D data. Image-conditioned 3D generative models allow us to address the inherent ambiguity in single-view 3D reconstruction. Given one image of an object, there is often more than one possible 3D volume that matches the input image, because a single image never captures all sides of an object. Deterministic models are inherently limited to producing one possible reconstruction and therefore make mistakes in ambiguous settings. Modelling distributions of 3D shapes is challenging because 3D ground truth data is often not available. We propose to solve the issue of data availability by training a diffusion model which jointly denoises a multi-view image set.We constrain the output of Viewset Diffusion models to a single 3D volume per image set, guaranteeing consistent geometry. Training is done through reconstruction losses on renderings, allowing training with only three images per object. Our design of architecture and training scheme allows our model to perform 3D generation and generative, ambiguity-aware single-view reconstruction in a feed-forward manner. Project page: szymanowiczs.github.io/viewset-diffusion.
翻译:我们提出视图集扩散(Viewset Diffusion):一种从二维数据训练图像条件三维生成模型的框架。图像条件三维生成模型能够解决单视图三维重建中固有的歧义性问题。由于单个图像无法捕捉物体的所有侧面,给定物体的一张图像时,往往存在多个与该输入图像匹配的三维体积。确定性模型本质上只能生成一种可能的重建结果,因此在歧义场景中会出现错误。三维形状分布的建模具有挑战性,因为三维真实数据通常难以获取。我们通过训练一个联合去噪多视图图像集的扩散模型来解决数据可用性问题。我们约束视图集扩散模型的输出为每个图像集生成唯一的三维体积,从而保证几何一致性。训练过程通过渲染的重建损失进行,仅需每个物体三张图像即可完成训练。我们的架构与训练方案设计使模型能够以前馈方式执行三维生成以及具有歧义感知能力的生成式单视图重建。项目页面:szymanowiczs.github.io/viewset-diffusion。