The recently introduced Forward-Diffusion method allows to train a 3D diffusion model using only 2D images for supervision. However, it does not easily generalise to different 3D representations and requires a computationally expensive auto-regressive sampling process to generate the underlying 3D scenes. In this paper, we propose GOEn: Gradient Origin Encoding (pronounced "gone"). GOEn can encode input images into any type of 3D representation without the need to use a pre-trained image feature extractor. It can also handle single, multiple or no source view(s) alike, by design, and tries to maximise the information transfer from the views to the encodings. Our proposed GOEnFusion model pairs GOEn encodings with a realisation of the Forward-Diffusion model which addresses the limitations of the vanilla Forward-Diffusion realisation. We evaluate how much information the GOEn mechanism transfers to the encoded representations, and how well it captures the prior distribution over the underlying 3D scenes, through the lens of a partial AutoEncoder. Lastly, the efficacy of the GOEnFusion model is evaluated on the recently proposed OmniObject3D dataset while comparing to the state-of-the-art Forward and non-Forward-Diffusion models and other 3D generative models.
翻译:近期提出的正向扩散方法(Forward-Diffusion)允许仅使用二维图像作为监督来训练三维扩散模型。然而,该方法难以泛化到不同的三维表示形式,且需要计算成本高昂的自回归采样过程来生成底层三维场景。本文提出GOEn:梯度起源编码(英文发音同"gone")。GOEn能够将输入图像编码为任意类型的三维表示,无需使用预训练的图像特征提取器。通过设计,它可灵活处理单视图、多视图或无源视图的情况,并力求最大化从视图到编码的信息传递。我们提出的GOEnFusion模型将GOEn编码与正向扩散模型的实现相结合,解决了原始正向扩散实现的局限性。通过部分自编码器的视角,我们评估了GOEn机制向编码表示传递的信息量,以及其对底层三维场景先验分布的捕捉能力。最后,在最新提出的OmniObject3D数据集上,通过与当前最先进的正向扩散/非正向扩散模型及其他三维生成模型的对比,验证了GOEnFusion模型的效能。