While remarkable success has been achieved through diffusion-based 3D generative models for shapes, 4D generative modeling remains challenging due to the complexity of object deformations over time. We propose DNF, a new 4D representation for unconditional generative modeling that efficiently models deformable shapes with disentangled shape and motion while capturing high-fidelity details in the deforming objects. To achieve this, we propose a dictionary learning approach to disentangle 4D motion from shape as neural fields. Both shape and motion are represented as learned latent spaces, where each deformable shape is represented by its shape and motion global latent codes, shape-specific coefficient vectors, and shared dictionary information. This captures both shape-specific detail and global shared information in the learned dictionary. Our dictionary-based representation well balances fidelity, contiguity and compression -- combined with a transformer-based diffusion model, our method is able to generate effective, high-fidelity 4D animations.
翻译:尽管基于扩散的三维形状生成模型已取得显著成功,但由于物体随时间形变的复杂性,四维生成建模仍然具有挑战性。我们提出DNF,一种用于无条件生成建模的新型四维表示方法,它能够高效建模具有解耦形状与运动的可变形几何体,同时捕捉变形物体的高保真细节。为实现这一目标,我们提出一种字典学习方法,将四维运动从形状中解耦为神经场。形状和运动均表示为学习得到的潜在空间,其中每个可变形几何体由其形状与运动的全局潜在编码、形状特定系数向量以及共享字典信息共同表征。这种表示方法既能捕捉学习字典中的形状特定细节,也能保留全局共享信息。我们基于字典的表示在保真度、连续性和压缩性之间取得了良好平衡——结合基于Transformer的扩散模型,我们的方法能够生成高效且高保真度的四维动画序列。