Video Frame Interpolation (VFI) has been extensively explored and demonstrated, yet its application to polarization remains largely unexplored. Due to the selective transmission of light by polarized filters, longer exposure times are typically required to ensure sufficient light intensity, which consequently lower the temporal sample rates. Furthermore, because polarization reflected by objects varies with shooting perspective, focusing solely on estimating pixel displacement is insufficient to accurately reconstruct the intermediate polarization. To tackle these challenges, this study proposes a multi-stage and multi-scale network called Swin-VFI based on the Swin-Transformer and introduces a tailored loss function to facilitate the network's understanding of polarization changes. To ensure the practicality of our proposed method, this study evaluates its interpolated frames in Shape from Polarization (SfP) and Human Shape Reconstruction tasks, comparing them with other state-of-the-art methods such as CAIN, FLAVR, and VFIT. Experimental results demonstrate our approach's superior reconstruction accuracy across all tasks.
翻译:视频帧插值技术已得到广泛探索与验证,但其在偏振领域的应用仍鲜有研究。由于偏振滤光片对光线的选择性透过特性,通常需要更长的曝光时间来确保足够的光强,这不可避免地降低了时间采样率。此外,由于物体反射的偏振信息会随拍摄视角而变化,仅专注于估计像素位移不足以准确重建中间帧的偏振状态。为应对这些挑战,本研究提出了一种基于Swin-Transformer的多阶段多尺度网络Swin-VFI,并引入定制化的损失函数以增强网络对偏振变化规律的理解。为确保所提方法的实用性,本研究通过偏振三维重建与人体形状重建任务评估插值帧质量,并与CAIN、FLAVR、VFIT等先进方法进行对比。实验结果表明,我们的方法在所有任务中均展现出更优的重建精度。