Monocular 3D motion capture (mocap) is beneficial to many applications. The use of a single camera, however, often fails to handle occlusions of different body parts and hence it is limited to capture relatively simple movements. We present a light-weight, hybrid mocap technique called HybridCap that augments the camera with only 4 Inertial Measurement Units (IMUs) in a learning-and-optimization framework. We first employ a weakly-supervised and hierarchical motion inference module based on cooperative Gated Recurrent Unit (GRU) blocks that serve as limb, body and root trackers as well as an inverse kinematics solver. Our network effectively narrows the search space of plausible motions via coarse-to-fine pose estimation and manages to tackle challenging movements with high efficiency. We further develop a hybrid optimization scheme that combines inertial feedback and visual cues to improve tracking accuracy. Extensive experiments on various datasets demonstrate HybridCap can robustly handle challenging movements ranging from fitness actions to Latin dance. It also achieves real-time performance up to 60 fps with state-of-the-art accuracy.
翻译:单目三维运动捕捉(mocap)对许多应用有益。然而,使用单一摄像头通常难以处理不同身体部位的遮挡问题,因此仅限于捕捉相对简单的动作。我们提出一种轻量级混合运动捕捉技术HybridCap,该技术在学习和优化框架中仅用4个惯性测量单元(IMU)增强摄像头。我们首先采用基于协作门控循环单元(GRU)模块的弱监督分层运动推理模块,这些模块分别作为肢体、身体和根跟踪器以及逆运动学求解器。我们的网络通过从粗到细的姿势估计有效缩小合理运动的搜索空间,并能够高效处理具有挑战性的动作。我们进一步开发了一种混合优化方案,结合惯性反馈和视觉线索以提高跟踪精度。在多个数据集上的广泛实验表明,HybridCap能够稳健处理从健身动作到拉丁舞等各类挑战性运动。该系统还实现了高达60帧/秒的实时性能,并达到最先进的精度。