Event camera is an emerging bio-inspired vision sensors that report per-pixel brightness changes asynchronously. It holds noticeable advantage of high dynamic range, high speed response, and low power budget that enable it to best capture local motions in uncontrolled environments. This motivates us to unlock the potential of event cameras for human pose estimation, as the human pose estimation with event cameras is rarely explored. Due to the novel paradigm shift from conventional frame-based cameras, however, event signals in a time interval contain very limited information, as event cameras can only capture the moving body parts and ignores those static body parts, resulting in some parts to be incomplete or even disappeared in the time interval. This paper proposes a novel densely connected recurrent architecture to address the problem of incomplete information. By this recurrent architecture, we can explicitly model not only the sequential but also non-sequential geometric consistency across time steps to accumulate information from previous frames to recover the entire human bodies, achieving a stable and accurate human pose estimation from event data. Moreover, to better evaluate our model, we collect a large scale multimodal event-based dataset that comes with human pose annotations, which is by far the most challenging one to the best of our knowledge. The experimental results on two public datasets and our own dataset demonstrate the effectiveness and strength of our approach. Code can be available online for facilitating the future research.
翻译:事件相机是一种新兴的仿生视觉传感器,能够异步地报告逐像素的亮度变化。它具有高动态范围、快速响应和低功耗的显著优势,使其能够在非受控环境下最佳地捕捉局部运动。这促使我们探索事件相机在人体姿态估计中的应用潜力,因为基于事件相机的人体姿态估计目前鲜有研究。然而,由于从传统帧式相机向全新范式的转变,事件信号在一段时间间隔内包含的信息极为有限,因为事件相机只能捕捉运动的身体部位,而忽略静态部位,导致某些部位在时间间隔内不完整甚至完全缺失。本文提出了一种新颖的密集连接循环架构,以解决信息不完整的问题。借助该循环架构,我们能够显式地建模跨时间步的序列与非序列几何一致性,从而累积先前帧的信息以恢复完整人体,实现基于事件数据的稳定且准确的人体姿态估计。此外,为了更好地评估模型,我们收集了一个大规模的多模态事件数据集,并附带人体姿态标注,据我们所知,这是目前最具挑战性的数据集。在两个公开数据集及我们自建数据集上的实验结果证明了我们方法的有效性和优越性。代码将公开以供未来研究使用。