Event-based cameras are becoming increasingly popular for their ability to capture high-speed motion with low latency and high dynamic range. However, generating videos from events remains challenging due to the highly sparse and varying nature of event data. To address this, in this study, we propose HyperE2VID, a dynamic neural network architecture for event-based video reconstruction. Our approach uses hypernetworks to generate per-pixel adaptive filters guided by a context fusion module that combines information from event voxel grids and previously reconstructed intensity images. We also employ a curriculum learning strategy to train the network more robustly. Our comprehensive experimental evaluations across various benchmark datasets reveal that HyperE2VID not only surpasses current state-of-the-art methods in terms of reconstruction quality but also achieves this with fewer parameters, reduced computational requirements, and accelerated inference times.
翻译:事件相机因其能够以低延迟和高动态范围捕捉高速运动而日益受到青睐。然而,由于事件数据具有高度稀疏性和变化性,从事件生成视频仍具挑战性。为此,本研究提出HyperE2VID——一种用于事件驱动视频重建的动态神经网络架构。该方法利用超网络生成逐像素自适应滤波器,并通过上下文融合模块结合事件体素网格与先前重建强度图像的信息进行引导。同时,我们采用课程学习策略以增强网络的鲁棒性训练。在多个基准数据集上的全面实验评估表明,HyperE2VID不仅在重建质量上超越当前最先进方法,且以更少的参数、更低的计算需求和更快的推理速度实现了这一性能。