少即是多：用于光场图像超分辨率的Skim Transformer (Less is More: Skim Transformer for Light Field Image Super-resolution)

A light field image captures scenes through its micro-lens array, providing a rich representation that encompasses spatial and angular information. While this richness comes at significant data redundancy, most existing methods tend to indiscriminately utilize all the information from sub-aperture images (SAIs) in an attempt to harness every visual cue regardless of their disparity significance. However, this paradigm inevitably leads to disparity entanglement, a fundamental cause of inefficiency in light field image processing. To address this limitation, we introduce the Skim Transformer, a novel architecture inspired by the "less is more" philosophy. It features a multi-branch structure where each branch is dedicated to a specific disparity range by constructing its attention score matrix over a skimmed subset of SAIs, rather than all of them. Building upon it, we present SkimLFSR, an efficient yet powerful network for light field image super-resolution. Requiring only 67% of the prior leading method's parameters}, SkimLFSR achieves state-of-the-art results surpassing the best existing method by 0.63 dB and 0.35 dB PSNR at the 2x and 4x tasks, respectively. Through in-depth analyses, we reveal that SkimLFSR, guided by the predefined skimmed SAI sets as prior knowledge, demonstrates distinct disparity-aware behaviors in attending to visual cues. Last but not least, we conduct an experiment to validate SkimLFSR's generalizability across different angular resolutions, where it achieves competitive performance on a larger angular resolution without any retraining or major network modifications. These findings highlight its effectiveness and adaptability as a promising paradigm for light field image processing.

翻译：光场图像通过其微透镜阵列捕捉场景，提供了包含空间和角度信息的丰富表征。尽管这种丰富性伴随着显著的数据冗余，但现有方法大多倾向于不加区分地利用所有子孔径图像中的信息，试图利用每一个视觉线索而无论其视差显著性如何。然而，这种范式不可避免地导致视差纠缠，这是光场图像处理效率低下的根本原因。为应对这一局限，我们引入Skim Transformer，一种受“少即是多”理念启发的新型架构。它采用多分支结构，其中每个分支通过在一组经过筛选的子孔径图像子集（而非全部）上构建其注意力得分矩阵，专注于特定的视差范围。在此基础上，我们提出SkimLFSR，一个高效而强大的光场图像超分辨率网络。仅需先前领先方法67%的参数，SkimLFSR在2倍和4倍超分辨率任务上分别以超出最佳现有方法0.63 dB和0.35 dB PSNR的优异表现，达到了最先进的性能水平。通过深入分析，我们发现SkimLFSR在预定义的筛选子孔径图像集作为先验知识的引导下，在关注视觉线索时展现出明显的视差感知行为。最后但同样重要的是，我们进行了一项实验以验证SkimLFSR在不同角度分辨率下的泛化能力，其在更大的角度分辨率上无需任何重新训练或重大网络修改即可获得有竞争力的性能。这些发现凸显了其作为光场图像处理有前景范式的有效性和适应性。