We introduce HyperFields, a method for generating text-conditioned Neural Radiance Fields (NeRFs) with a single forward pass and (optionally) some fine-tuning. Key to our approach are: (i) a dynamic hypernetwork, which learns a smooth mapping from text token embeddings to the space of NeRFs; (ii) NeRF distillation training, which distills scenes encoded in individual NeRFs into one dynamic hypernetwork. These techniques enable a single network to fit over a hundred unique scenes. We further demonstrate that HyperFields learns a more general map between text and NeRFs, and consequently is capable of predicting novel in-distribution and out-of-distribution scenes -- either zero-shot or with a few finetuning steps. Finetuning HyperFields benefits from accelerated convergence thanks to the learned general map, and is capable of synthesizing novel scenes 5 to 10 times faster than existing neural optimization-based methods. Our ablation experiments show that both the dynamic architecture and NeRF distillation are critical to the expressivity of HyperFields.
翻译:我们提出HyperFields方法,该方法可通过单次前向传播及(可选)微调生成文本条件神经辐射场。其核心创新包括:(i)动态超网络,学习从文本标记嵌入到神经辐射场空间的平滑映射;(ii)神经辐射场蒸馏训练,将单个神经辐射场编码的场景蒸馏至动态超网络中。这些技术使单一网络能够拟合超过一百个独立场景。我们进一步证明,HyperFields学习到了文本与神经辐射场之间更通用的映射关系,因此能够零样本或通过少量微调步骤预测分布内及分布外的新场景。得益于所学的通用映射,HyperFields的微调可加速收敛,其合成新场景的速度比现有基于神经优化的方法快5至10倍。消融实验表明,动态架构与神经辐射场蒸馏对HyperFields的表达能力至关重要。