Implicit neural representation (INR) has recently emerged as a promising paradigm for signal representations. Typically, INR is parameterized by a multiplayer perceptron (MLP) which takes the coordinates as the inputs and generates corresponding attributes of a signal. However, MLP-based INRs face two critical issues: i) individually considering each coordinate while ignoring the connections; ii) suffering from the spectral bias thus failing to learn high-frequency components. While target visual signals usually exhibit strong local structures and neighborhood dependencies, and high-frequency components are significant in these signals, the issues harm the representational capacity of INRs. This paper proposes Conv-INR, the first INR model fully based on convolution. Due to the inherent attributes of convolution, Conv-INR can simultaneously consider adjacent coordinates and learn high-frequency components effectively. Compared to existing MLP-based INRs, Conv-INR has better representational capacity and trainability without requiring primary function expansion. We conduct extensive experiments on four tasks, including image fitting, CT/MRI reconstruction, and novel view synthesis, Conv-INR all significantly surpasses existing MLP-based INRs, validating the effectiveness. Finally, we raise three reparameterization methods that can further enhance the performance of the vanilla Conv-INR without introducing any extra inference cost.
翻译:隐式神经表示(INR)近年来已成为一种极具前景的信号表示范式。典型的INR由多层感知机(MLP)参数化实现,该网络以坐标作为输入并生成信号的对应属性。然而,基于MLP的INR存在两个关键问题:i) 独立处理每个坐标而忽略坐标间的关联性;ii) 受频谱偏差影响而难以学习高频分量。由于目标视觉信号通常具有显著的局部结构和邻域依赖性,且高频分量在这些信号中至关重要,上述问题严重制约了INR的表征能力。本文提出Conv-INR——首个完全基于卷积的INR模型。得益于卷积的固有特性,Conv-INR能够同时考虑相邻坐标并有效学习高频分量。与现有基于MLP的INR相比,Conv-INR在不依赖基函数扩展的前提下,具备更优的表征能力和训练稳定性。我们在图像拟合、CT/MRI重建和新视角合成四个任务上进行了大量实验,Conv-INR均显著超越现有基于MLP的INR,验证了其有效性。最后,我们提出三种重参数化方法,可在不引入额外推理开销的前提下进一步提升基础Conv-INR的性能。