The multi-plane representation has been highlighted for its fast training and inference across static and dynamic neural radiance fields. This approach constructs relevant features via projection onto learnable grids and interpolating adjacent vertices. However, it has limitations in capturing low-frequency details and tends to overuse parameters for low-frequency features due to its bias toward fine details, despite its multi-resolution concept. This phenomenon leads to instability and inefficiency when training poses are sparse. In this work, we propose a method that synergistically integrates multi-plane representation with a coordinate-based MLP network known for strong bias toward low-frequency signals. The coordinate-based network is responsible for capturing low-frequency details, while the multi-plane representation focuses on capturing fine-grained details. We demonstrate that using residual connections between them seamlessly preserves their own inherent properties. Additionally, the proposed progressive training scheme accelerates the disentanglement of these two features. We demonstrate empirically that our proposed method outperforms baseline models for both static and dynamic NeRFs with sparse inputs, achieving comparable results with fewer parameters.
翻译:多平面表示因其在静态和动态神经辐射场中的快速训练与推理能力而备受关注。该方法通过将特征投影到可学习网格并插值相邻顶点来构建相关特征。然而,尽管采用了多分辨率设计理念,该表示在捕捉低频细节方面存在局限,且由于其偏向高频细节的特性,往往需要过度使用参数来表征低频特征。当训练视角稀疏时,这一现象会导致训练不稳定与效率低下。本研究提出一种协同集成方法,将多平面表示与基于坐标的多层感知机网络相结合——后者以对低频信号的强表征偏好而著称。基于坐标的网络负责捕捉低频细节,而多平面表示则专注于提取细粒度特征。我们证明,通过两者之间的残差连接可以无缝保留各自的内在特性。此外,所提出的渐进式训练方案加速了这两种特征的解耦过程。实验结果表明,在稀疏输入条件下,我们提出的方法在静态与动态神经辐射场任务中均优于基线模型,且能以更少的参数量达到可比性能。