This paper proposes a low latency neural network architecture for event-based dense prediction tasks. Conventional architectures encode entire scene contents at a fixed rate regardless of their temporal characteristics. Instead, the proposed network encodes contents at a proper temporal scale depending on its movement speed. We achieve this by constructing temporal hierarchy using stacked latent memories that operate at different rates. Given low latency event steams, the multi-level memories gradually extract dynamic to static scene contents by propagating information from the fast to the slow memory modules. The architecture not only reduces the redundancy of conventional architectures but also exploits long-term dependencies. Furthermore, an attention-based event representation efficiently encodes sparse event streams into the memory cells. We conduct extensive evaluations on three event-based dense prediction tasks, where the proposed approach outperforms the existing methods on accuracy and latency, while demonstrating effective event and image fusion capabilities. The code is available at https://hamarh.github.io/hmnet/
翻译:本文提出一种针对基于事件的密集预测任务的低延迟神经网络架构。传统架构以固定速率编码整个场景内容,而忽略了其时序特性。相反,本文提出的网络根据物体运动速度在适当的时间尺度上对内容进行编码。通过利用以不同速率运行的堆叠潜在记忆构建时间层级结构,我们实现了这一目标。面对低延迟事件流,多层记忆通过将信息从快速记忆模块传播到慢速记忆模块,逐步从动态场景内容提取到静态场景内容。该架构不仅减少了传统架构的冗余,还充分利用了长期依赖关系。此外,一种基于注意力的事件表示能高效地将稀疏事件流编码至记忆单元中。我们在三个基于事件的密集预测任务上进行了全面评估,结果表明,本文方法在准确率和延迟方面优于现有方法,同时展现出有效的事件与图像融合能力。代码见 https://hamarh.github.io/hmnet/