Recent learning-based approaches have achieved significant progress in light field (LF) image super-resolution (SR) by exploring convolution-based or transformer-based network structures. However, LF imaging has many intrinsic physical priors that have not been fully exploited. In this paper, we analyze the coordinate transformation of the LF imaging process to reveal the geometric relationship in the LF images. Based on such geometric priors, we introduce a new LF subspace of virtual-slit images (VSI) that provide sub-pixel information complementary to sub-aperture images. To leverage the abundant correlation across the four-dimensional data with manageable complexity, we propose learning ensemble representation of all $C_4^2$ LF subspaces for more effective feature extraction. To super-resolve image structures from undersampled LF data, we propose a geometry-aware decoder, named EPIXformer, which constrains the transformer's operational searching regions with a LF physical prior. Experimental results on both spatial and angular SR tasks demonstrate that the proposed method outperforms other state-of-the-art schemes, especially in handling various disparities.
翻译:近期基于学习的方法通过探索卷积或Transformer网络结构,在光场图像超分辨率领域取得了显著进展。然而,光场成像具有许多尚未被充分利用的内在物理先验。本文分析了光场成像过程的坐标变换,揭示了光场图像中的几何关系。基于这些几何先验,我们引入了一种新的虚狭缝图像光场子空间,该子空间提供了补充子孔径图像的亚像素信息。为了在可控复杂度下充分利用四维数据中的丰富相关性,我们提出学习所有$C_4^2$个光场子空间的集成表示,以实现更有效的特征提取。针对欠采样光场数据中的图像结构超分辨率重建,我们设计了几何感知解码器EPIXformer,通过光场物理先验约束Transformer的操作搜索区域。在空间和角度超分辨率任务上的实验结果表明,所提方法在各类视差场景中均优于现有最优方案。