Ground pressure exerted by the human body is a valuable source of information for human activity recognition (HAR) in unobtrusive pervasive sensing. While data collection from pressure sensors to develop HAR solutions requires significant resources and effort, we present a novel end-to-end framework, PresSim, to synthesize sensor data from videos of human activities to reduce such effort significantly. PresSim adopts a 3-stage process: first, extract the 3D activity information from videos with computer vision architectures; then simulate the floor mesh deformation profiles based on the 3D activity information and gravity-included physics simulation; lastly, generate the simulated pressure sensor data with deep learning models. We explored two approaches for the 3D activity information: inverse kinematics with mesh re-targeting, and volumetric pose and shape estimation. We validated PresSim with an experimental setup with a monocular camera to provide input and a pressure-sensing fitness mat (80x28 spatial resolution) to provide the sensor ground truth, where nine participants performed a set of predefined yoga sequences.
翻译:人体施加于地面的压力是隐式普适感知中人类活动识别(HAR)的重要信息源。虽然通过压力传感器采集数据来开发HAR解决方案需要大量资源和人力,我们提出了一种新颖的端到端框架PresSim,通过人体活动视频合成传感器数据以显著降低此类开销。PresSim采用三阶段流程:首先,利用计算机视觉架构从视频中提取三维活动信息;其次,基于三维活动信息及包含重力的物理模拟,仿真地板网格形变分布;最后,通过深度学习模型生成模拟压力传感器数据。我们探索了两种三维活动信息获取方法:逆向运动学结合网格重定向,以及体积姿态与形状估计。我们通过单目摄像头输入及压力感应健身垫(80×28空间分辨率)提供传感器真值的实验装置验证了PresSim,九名参与者在实验中完成了一组预定义的瑜伽序列。