Existing approaches to Implicit Neural Representation (INR) can be interpreted as a global scene representation via a linear combination of Fourier bases of different frequencies. However, such universal basis functions can limit the representation capability in local regions where a specific component is unnecessary, resulting in unpleasant artifacts. To this end, we introduce a learnable spatial mask that effectively dispatches distinct Fourier bases into respective regions. This translates into collaging Fourier patches, thus enabling an accurate representation of complex signals. Comprehensive experiments demonstrate the superior reconstruction quality of the proposed approach over existing baselines across various INR tasks, including image fitting, video representation, and 3D shape representation. Our method outperforms all other baselines, improving the image fitting PSNR by over 3dB and 3D reconstruction to 98.81 IoU and 0.0011 Chamfer Distance.
翻译:现有的隐式神经表征方法可被解释为通过不同频率的傅里叶基的线性组合实现的全局场景表征。然而,这种通用基函数在不需要特定分量的局部区域会限制表征能力,导致不理想的伪影。为此,我们引入了一种可学习的空间掩码,能够将不同的傅里叶基有效地分配到各自对应的区域。这相当于拼接傅里叶片段,从而能够精确表征复杂信号。综合实验表明,所提方法在图像拟合、视频表征和三维形状表征等多种隐式神经表征任务中,相较于现有基线方法展现出更优的重建质量。我们的方法在所有基线方法中表现最佳,在图像拟合任务中峰值信噪比提升超过3dB,三维重建的交并比达到98.81,倒角距离达到0.0011。