Current 3D/4D generation methods are usually optimized for photorealism, efficiency, and aesthetics. However, they often fail to preserve the semantic identity of the subject across different viewpoints. Adapting generation methods with one or few images of a specific subject (also known as Personalization or Subject-driven generation) allows generating visual content that align with the identity of the subject. However, personalized 3D/4D generation is still largely underexplored. In this work, we introduce TIRE (Track, Inpaint, REsplat), a novel method for subject-driven 3D/4D generation. It takes an initial 3D asset produced by an existing 3D generative model as input and uses video tracking to identify the regions that need to be modified. Then, we adopt a subject-driven 2D inpainting model for progressively infilling the identified regions. Finally, we resplat the modified 2D multi-view observations back to 3D while still maintaining consistency. Extensive experiments demonstrate that our approach significantly improves identity preservation in 3D/4D generation compared to state-of-the-art methods. Our project website is available at https://zsh2000.github.io/track-inpaint-resplat.github.io/.
翻译:当前的三维/四维生成方法通常针对真实感、效率与美学效果进行优化,但在不同视角下往往难以保持主体的语义一致性。基于单张或少量特定主体图像进行生成方法适配(亦称为个性化或主体驱动生成),能够生成与主体身份特征相符的视觉内容。然而,个性化的三维/四维生成研究仍处于探索不足的阶段。本研究提出TIRE(追踪、修复、重投影)这一主体驱动三维/四维生成新方法。该方法以现有三维生成模型产生的初始三维资产为输入,通过视频追踪技术识别需要修改的区域,随后采用主体驱动的二维修复模型对识别区域进行渐进式填充,最终将修改后的二维多视角观测数据重投影至三维空间并保持一致性。大量实验表明,相较于现有先进方法,本方法在三维/四维生成中的身份保持能力显著提升。项目网站详见 https://zsh2000.github.io/track-inpaint-resplat.github.io/。