We present Viewset Diffusion, a diffusion-based generator that outputs 3D objects while only using multi-view 2D data for supervision. We note that there exists a one-to-one mapping between viewsets, i.e., collections of several 2D views of an object, and 3D models. Hence, we train a diffusion model to generate viewsets, but design the neural network generator to reconstruct internally corresponding 3D models, thus generating those too. We fit a diffusion model to a large number of viewsets for a given category of objects. The resulting generator can be conditioned on zero, one or more input views. Conditioned on a single view, it performs 3D reconstruction accounting for the ambiguity of the task and allowing to sample multiple solutions compatible with the input. The model performs reconstruction efficiently, in a feed-forward manner, and is trained using only rendering losses using as few as three views per viewset. Project page: szymanowiczs.github.io/viewset-diffusion.
翻译:我们提出视图集扩散(Viewset Diffusion),这是一种基于扩散的生成器,仅使用多视角2D数据进行监督即可输出3D物体。我们注意到,视图集(即物体多个2D视角的集合)与3D模型之间存在一一对应关系。因此,我们训练一个扩散模型生成视图集,但设计神经网络生成器在内部重建对应的3D模型,从而同步生成这些模型。我们针对给定类别的大量视图集拟合扩散模型。该生成器可基于零张、一张或多张输入视图进行条件控制。当基于单张视图进行条件生成时,它能够执行3D重建,同时考虑任务的不确定性,并允许采样与输入兼容的多种解决方案。该模型以前馈方式高效执行重建,仅使用每视图集最少三张视图的渲染损失进行训练。项目页面:szymanowiczs.github.io/viewset-diffusion。