Recent years have witnessed significant advancements in light field image super-resolution (LFSR) owing to the progress of modern neural networks. However, these methods often face challenges in capturing long-range dependencies (CNN-based) or encounter quadratic computational complexities (Transformer-based), which limit their performance. Recently, the State Space Model (SSM) with selective scanning mechanism (S6), exemplified by Mamba, has emerged as a superior alternative in various vision tasks compared to traditional CNN- and Transformer-based approaches, benefiting from its effective long-range sequence modeling capability and linear-time complexity. Therefore, integrating S6 into LFSR becomes compelling, especially considering the vast data volume of 4D light fields. However, the primary challenge lies in \emph{designing an appropriate scanning method for 4D light fields that effectively models light field features}. To tackle this, we employ SSMs on the informative 2D slices of 4D LFs to fully explore spatial contextual information, complementary angular information, and structure information. To achieve this, we carefully devise a basic SSM block characterized by an efficient SS2D mechanism that facilitates more effective and efficient feature learning on these 2D slices. Based on the above two designs, we further introduce an SSM-based network for LFSR termed LFMamba. Experimental results on LF benchmarks demonstrate the superior performance of LFMamba. Furthermore, extensive ablation studies are conducted to validate the efficacy and generalization ability of our proposed method. We expect that our LFMamba shed light on effective representation learning of LFs with state space models.
翻译:近年来,得益于现代神经网络的进步,光场图像超分辨率技术取得了显著进展。然而,现有方法在捕捉长程依赖关系(基于CNN的方法)或面临二次计算复杂度(基于Transformer的方法)方面仍存在挑战,这限制了其性能。最近,以Mamba为代表的、具备选择性扫描机制的状态空间模型在多种视觉任务中展现出优于传统CNN和Transformer方法的潜力,这得益于其有效的长程序列建模能力和线性时间复杂度。因此,将S6机制集成到光场超分辨率任务中具有显著吸引力,尤其是考虑到4D光场庞大的数据量。然而,主要挑战在于**为4D光场设计一种合适的扫描方法,以有效建模光场特征**。为此,我们在4D光场的信息丰富的2D切片上应用状态空间模型,以充分探索空间上下文信息、互补的角度信息以及结构信息。为实现这一目标,我们精心设计了一个基础SSM模块,其特点是采用高效的SS2D机制,以促进在这些2D切片上进行更有效和高效的特征学习。基于以上两种设计,我们进一步提出了一种基于SSM的光场超分辨率网络,称为LFMamba。在光场基准数据集上的实验结果表明,LFMamba具有优越的性能。此外,我们进行了广泛的消融研究,以验证所提方法的有效性和泛化能力。我们期望LFMamba能为利用状态空间模型进行光场的有效表示学习提供新的思路。