The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADDER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.
翻译:监控视频的强时间一致性使得传统方法能够实现出色的压缩性能,但下游视觉应用需要处理高数据率的解码图像帧。由于应用难以从压缩视频表示中提取时间冗余信息,我们提出一种新型系统,通过稀疏解压缩表示传递时间冗余性。我们利用称为ADDER的视频表示框架,将帧化视频转码为稀疏异步强度采样。我们引入了内容自适应、有损压缩及经典视觉算法的异步形式等机制。在VIRAT监控视频数据集上评估系统,与OpenCV相比,FAST特征检测的中位数速度提升43.7%。我们运行与OpenCV相同的算法,但仅处理接收新异步事件的像素,而非处理图像帧中的每个像素。我们的工作为即将到来的神经形态传感器铺平道路,并适用于未来基于脉冲神经网络的应用。