Recently, a surge of 3D style transfer methods has been proposed that leverage the scene reconstruction power of a pre-trained neural radiance field (NeRF). To successfully stylize a scene this way, one must first reconstruct a photo-realistic radiance field from collected images of the scene. However, when only sparse input views are available, pre-trained few-shot NeRFs often suffer from high-frequency artifacts, which are generated as a by-product of high-frequency details for improving reconstruction quality. Is it possible to generate more faithful stylized scenes from sparse inputs by directly optimizing encoding-based scene representation with target style? In this paper, we consider the stylization of sparse-view scenes in terms of disentangling content semantics and style textures. We propose a coarse-to-fine sparse-view scene stylization framework, where a novel hierarchical encoding-based neural representation is designed to generate high-quality stylized scenes directly from implicit scene representations. We also propose a new optimization strategy with content strength annealing to achieve realistic stylization and better content preservation. Extensive experiments demonstrate that our method can achieve high-quality stylization of sparse-view scenes and outperforms fine-tuning-based baselines in terms of stylization quality and efficiency.
翻译:近期涌现出大量利用预训练神经辐射场(NeRF)的场景重建能力进行三维风格迁移的方法。要成功对场景进行风格化,首先需要从采集的场景图像中重建出逼真的辐射场。然而,在仅有稀疏输入视图的情况下,预训练的少样本NeRF常产生高频伪影——这些伪影是提升重建质量过程中高频细节的副产品。是否可能通过直接优化基于编码的场景表示与目标风格,从稀疏输入生成更忠实的风格化场景?本文从解耦内容语义与风格纹理的角度,研究稀疏视角场景的风格化问题。我们提出一种由粗到精的稀疏视角场景风格化框架,其中设计了一种新颖的分层编码神经表示,可直接从隐式场景表示生成高质量风格化场景。同时提出基于内容强度退火的新型优化策略,实现逼真风格化与更好内容保持。大量实验表明,我们的方法能实现稀疏视角场景的高质量风格化,在风格化质量与效率上均优于基于微调的基线方法。