The spike camera, with its high temporal resolution, low latency, and high dynamic range, addresses high-speed imaging challenges like motion blur. It captures photons at each pixel independently, creating binary spike streams rich in temporal information but challenging for image reconstruction. Current algorithms, both traditional and deep learning-based, still need to be improved in the utilization of the rich temporal detail and the restoration of the details of the reconstructed image. To overcome this, we introduce Swin Spikeformer (SwinSF), a novel model for dynamic scene reconstruction from spike streams. SwinSF is composed of Spike Feature Extraction, Spatial-Temporal Feature Extraction, and Final Reconstruction Module. It combines shifted window self-attention and proposed temporal spike attention, ensuring a comprehensive feature extraction that encapsulates both spatial and temporal dynamics, leading to a more robust and accurate reconstruction of spike streams. Furthermore, we build a new synthesized dataset for spike image reconstruction which matches the resolution of the latest spike camera, ensuring its relevance and applicability to the latest developments in spike camera imaging. Experimental results demonstrate that the proposed network SwinSF sets a new benchmark, achieving state-of-the-art performance across a series of datasets, including both real-world and synthesized data across various resolutions. Our codes and proposed dataset will be available soon.
翻译:脉冲相机凭借其高时间分辨率、低延迟和高动态范围特性,有效解决了运动模糊等高速成像难题。该相机在每个像素点独立捕获光子,生成富含时间信息的二进制脉冲流,但这也为图像重建带来了挑战。当前算法,无论是传统方法还是基于深度学习的方法,在利用丰富时间细节和恢复重建图像细节方面仍有待改进。为此,我们提出Swin Spikeformer(SwinSF)——一种从脉冲流重建动态场景的新型模型。SwinSF由脉冲特征提取模块、时空特征提取模块和最终重建模块构成。该模型融合了移位窗口自注意力机制与提出的时序脉冲注意力机制,确保了对空间与时间动态特征的全面提取,从而实现了更鲁棒、更精确的脉冲流重建。此外,我们构建了与最新脉冲相机分辨率匹配的合成数据集,确保其与脉冲相机成像最新发展的相关性和适用性。实验结果表明,所提出的SwinSF网络在包括多分辨率真实数据与合成数据在内的一系列数据集上均取得了最先进的性能,树立了新的性能基准。我们的代码与所提数据集即将公开。