As a beloved sport worldwide, dancing is getting integrated into traditional and virtual reality-based gaming platforms nowadays. It opens up new opportunities in the technology-mediated dancing space. These platforms primarily rely on passive and continuous human pose estimation as an input capture mechanism. Existing solutions are mainly based on RGB or RGB-Depth cameras for dance games. The former suffers in low-lighting conditions due to the motion blur and low sensitivity, while the latter is too power-hungry, has a low frame rate, and has limited working distance. With ultra-low latency, energy efficiency, and wide dynamic range characteristics, the event camera is a promising solution to overcome these shortcomings. We propose YeLan, an event camera-based 3-dimensional high-frequency human pose estimation(HPE) system that survives low-lighting conditions and dynamic backgrounds. We collected the world's first event camera dance dataset and developed a fully customizable motion-to-event physics-aware simulator. YeLan outperforms the baseline models in these challenging conditions and demonstrated robustness against different types of clothing, background motion, viewing angle, occlusion, and lighting fluctuations.
翻译:作为一项广受全球喜爱的运动,舞蹈正逐渐融入传统及基于虚拟现实的游戏平台,为技术介导的舞蹈空间开辟了新机遇。这些平台主要依赖被动且连续的人体姿态估计作为输入捕捉机制。现有解决方案多基于RGB或RGB-Depth相机用于舞蹈游戏。前者因运动模糊和低灵敏度在弱光条件下表现不佳,而后者功耗过高、帧率较低且工作距离有限。凭借超低延迟、高能效和宽动态范围特性,事件相机成为克服这些局限性的有前景方案。我们提出YeLan——一种基于事件相机的三维高频人体姿态估计(HPE)系统,可在弱光条件和动态背景下稳定运行。我们采集了全球首个事件相机舞蹈数据集,并开发了完全可定制的运动到事件物理感知模拟器。YeLan在这些挑战性条件下优于基线模型,并展现出对不同服装类型、背景运动、视角、遮挡及光照波动的鲁棒性。