3D style transfer aims to generate stylized views of 3D scenes with specified styles, which requires high-quality generating and keeping multi-view consistency. Existing methods still suffer the challenges of high-quality stylization with texture details and stylization with multimodal guidance. In this paper, we reveal that the common training method of stylization with NeRF, which generates stylized multi-view supervision by 2D style transfer models, causes the same object in supervision to show various states (color tone, details, etc.) in different views, leading NeRF to tend to smooth the texture details, further resulting in low-quality rendering for 3D multi-style transfer. To tackle these problems, we propose a novel Multimodal-guided 3D Multi-style transfer of NeRF, termed MM-NeRF. First, MM-NeRF projects multimodal guidance into a unified space to keep the multimodal styles consistency and extracts multimodal features to guide the 3D stylization. Second, a novel multi-head learning scheme is proposed to relieve the difficulty of learning multi-style transfer, and a multi-view style consistent loss is proposed to track the inconsistency of multi-view supervision data. Finally, a novel incremental learning mechanism to generalize MM-NeRF to any new style with small costs. Extensive experiments on several real-world datasets show that MM-NeRF achieves high-quality 3D multi-style stylization with multimodal guidance, and keeps multi-view consistency and style consistency between multimodal guidance. Codes will be released.
翻译:摘要:三维风格迁移旨在生成具有指定风格的三维场景的渲染视图,这需要高质量生成并保持多视图一致性。现有方法仍面临纹理细节的高质量风格化及多模态引导风格化的挑战。本文揭示了利用NeRF进行风格化的常见训练方法——通过二维风格迁移模型生成风格化的多视图监督数据——会导致监督中同一物体在不同视角下呈现不同状态(色调、细节等),进而导致NeRF倾向于平滑纹理细节,最终造成三维多风格迁移的渲染质量低下。为解决这些问题,我们提出了一种新型多模态引导的NeRF三维多风格迁移方法,称为MM-NeRF。首先,MM-NeRF将多模态引导投影至统一空间以保持多模态风格一致性,并提取多模态特征引导三维风格化。其次,提出新型多头学习方案以缓解多风格迁移的学习难度,并提出多视图风格一致性损失以追踪多视图监督数据的不一致性。最后,提出增量学习机制使MM-NeRF能够以较小代价泛化至任意新风格。在多个真实世界数据集上的大量实验表明,MM-NeRF在多模态引导下实现了高质量的三维多风格化,并保持了多视图一致性及多模态引导间的风格一致性。代码将公开。