Video snapshot compressive imaging (SCI) aims to capture a sequence of video frames with only a single shot of a 2D detector, whose backbones rest in optical modulation patterns (also known as masks) and a computational reconstruction algorithm. Advanced deep learning algorithms and mature hardware are putting video SCI into practical applications. Yet, there are two clouds in the sunshine of SCI: i) low dynamic range as a victim of high temporal multiplexing, and ii) existing deep learning algorithms' degradation on real system. To address these challenges, this paper presents a deep optics framework to jointly optimize masks and a reconstruction network. Specifically, we first propose a new type of structural mask to realize motion-aware and full-dynamic-range measurement. Considering the motion awareness property in measurement domain, we develop an efficient network for video SCI reconstruction using Transformer to capture long-term temporal dependencies, dubbed Res2former. Moreover, sensor response is introduced into the forward model of video SCI to guarantee end-to-end model training close to real system. Finally, we implement the learned structural masks on a digital micro-mirror device. Experimental results on synthetic and real data validate the effectiveness of the proposed framework. We believe this is a milestone for real-world video SCI. The source code and data are available at https://github.com/pwangcs/DeepOpticsSCI.
翻译:视频快照压缩成像旨在仅通过二维探测器的单次拍摄捕获一帧视频序列,其核心依赖光学调制图案(亦称掩模)和计算重建算法。先进的深度学习算法与成熟硬件正推动视频SCI步入实际应用。然而,SCI领域仍面临两大挑战:i)高时间复用导致低动态范围;ii)现有深度学习算法在实际系统中性能退化。为应对这些挑战,本文提出一种深度光学框架,联合优化掩模与重建网络。具体而言,我们首先设计新型结构掩模,实现运动感知与全动态范围测量。鉴于测量域中的运动感知特性,我们构建高效视频SCI重建网络Res2former,利用Transformer捕捉长期时间依赖关系。此外,将传感器响应引入视频SCI前向模型,确保端到端模型训练贴近实际系统。最终,在数字微镜器件上实现学习所得结构掩模。合成数据与真实数据的实验结果验证了所提框架的有效性。我们认为这是迈向真实世界视频SCI的里程碑。源代码与数据见https://github.com/pwangcs/DeepOpticsSCI。