While generative models have seen significant adoption across a wide range of data modalities, including 3D data, a consensus on which model is best suited for which task has yet to be reached. Further, conditional information such as text and images to steer the generation process are frequently employed, whereas others, like partial 3D data, have not been thoroughly evaluated. In this work, we compare two of the most promising generative models--Denoising Diffusion Probabilistic Models and Autoregressive Causal Transformers--which we adapt for the tasks of generative shape modeling and completion. We conduct a thorough quantitative evaluation and comparison of both tasks, including a baseline discriminative model and an extensive ablation study. Our results show that (1) the diffusion model with continuous latents outperforms both the discriminative model and the autoregressive approach and delivers state-of-the-art performance on multi-modal shape completion from a single, noisy depth image under realistic conditions and (2) when compared on the same discrete latent space, the autoregressive model can match or exceed diffusion performance on these tasks.
翻译:尽管生成模型已在包括三维数据在内的多种数据模态中得到广泛应用,但关于何种模型最适合特定任务尚未达成共识。此外,文本和图像等用于引导生成过程的条件信息被频繁使用,而其他条件(如部分三维数据)尚未得到充分评估。在本研究中,我们比较了两种最具前景的生成模型——去噪扩散概率模型与自回归因果Transformer模型——并将其适配于生成式形状建模与补全任务。我们对两项任务进行了全面的定量评估与比较,包括一个基线判别模型及广泛的消融实验。结果表明:(1) 采用连续潜在空间的扩散模型在性能上优于判别模型与自回归方法,并在现实条件下基于单张噪声深度图像的多模态形状补全任务中实现了最先进的性能;(2) 当在相同离散潜在空间中进行比较时,自回归模型在这些任务上能够达到或超越扩散模型的性能。