Aerial surveillance requires high spatio-temporal resolution (HSTR) video for more accurate detection and tracking of objects. This is especially true for wide-area surveillance (WAS), where the surveyed region is large and the objects of interest are small. This paper proposes a dual camera system for the generation of HSTR video using reference-based super-resolution (RefSR). One camera captures high spatial resolution low frame rate (HSLF) video while the other captures low spatial resolution high frame rate (LSHF) video simultaneously for the same scene. A novel deep learning architecture is proposed to fuse HSLF and LSHF video feeds and synthesize HSTR video frames at the output. The proposed model combines optical flow estimation and (channel-wise and spatial) attention mechanisms to capture the fine motion and intricate dependencies between frames of the two video feeds. Simulations show that the proposed model provides significant improvement over existing reference-based SR techniques in terms of PSNR and SSIM metrics. The method also exhibits sufficient frames per second (FPS) for WAS when deployed on a power-constrained drone equipped with dual cameras.
翻译:航拍监控需要高时空分辨率(HSTR)视频,以实现更精确的目标检测与跟踪,尤其在广域监控(WAS)中,监控区域广阔且目标物体较小。本文提出一种基于参考超分辨率(RefSR)生成HSTR视频的双相机系统。其中一台相机采集高空间分辨率低帧率(HSLF)视频,另一台相机同时采集同一场景的低空间分辨率高帧率(LSHF)视频。我们提出了一种新颖的深度学习架构,用于融合HSLF与LSHF视频流,并合成输出HSTR视频帧。该模型结合光流估计与(通道级和空间级)注意力机制,以捕捉两路视频流帧间的精细运动及复杂依赖关系。仿真结果表明,在PSNR和SSIM指标上,所提模型相较于现有基于参考的超分辨率技术取得了显著提升。当部署于搭载双相机的功耗受限无人机时,该方法还能为广域监控提供足够的每秒帧数(FPS)。