The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADDER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.
翻译:监控视频固有的强时间一致性使得传统方法能够实现出色的压缩性能,但下游视觉应用需要处理高数据速率的解码图像帧。由于应用难以从压缩视频表示中直接提取时间冗余信息,我们提出了一种新型系统,通过稀疏解压表示来传递时间冗余。我们利用名为ADDER的视频表示框架,将基于帧的视频转码为稀疏的异步强度采样。我们引入了内容自适应、有损压缩以及经典视觉算法的异步化机制。在VIRAT监控视频数据集上的评估表明,相较于OpenCV,我们的系统在FAST特征检测中实现了中位数43.7%的速度提升。我们运行与OpenCV相同的算法,但仅处理接收到新异步事件的像素,而非处理图像帧中的每个像素。本工作为即将到来的神经形态传感器铺平了道路,并适用于未来基于脉冲神经网络的应用场景。