Multimodal Active Measurement for Human Mesh Recovery in Close Proximity

For safe and sophisticated physical human-robot interactions (pHRI), a robot needs to estimate the accurate body pose or mesh of the target person. However, in these pHRI scenarios, the robot cannot fully observe the target person's body with equipped cameras because the target person is usually close to the robot. This leads to severe truncation and occlusions, and results in poor accuracy of human pose estimation. For better accuracy of human pose estimation or mesh recovery on this limited information from cameras, we propose an active measurement and sensor fusion framework of the equipped cameras and other sensors such as touch sensors and 2D LiDAR. These touch and LiDAR sensing are obtained attendantly through pHRI without additional costs. These sensor measurements are sparse but reliable and informative cues for human mesh recovery. In our active measurement process, camera viewpoints and sensor placements are optimized based on the uncertainty of the estimated pose, which is closely related to the truncated or occluded areas. In our sensor fusion process, we fuse the sensor measurements to the camera-based estimated pose by minimizing the distance between the estimated mesh and measured positions. Our method is agnostic to robot configurations. Experiments were conducted using the Toyota Human Support Robot, which has a camera, 2D LiDAR, and a touch sensor on the robot arm. Our proposed method demonstrated the superiority in the human pose estimation accuracy on the quantitative comparison. Furthermore, our proposed method reliably estimated the pose of the target person in practical settings such as target people occluded by a blanket and standing aid with the robot arm.

翻译：为保障物理人机交互（pHRI）的安全性与复杂性，机器人需准确估计目标人体的姿态或网格结构。然而在近距交互场景中，目标人员通常紧邻机器人，致使搭载的摄像头无法完整观测其身体，从而引发严重的截断与遮挡问题，导致人体姿态估计精度显著下降。针对摄像头信息受限的困境，我们提出一种主动测量与传感器融合框架，融合摄像头与触觉传感器、二维激光雷达等附加传感器。这些触觉与激光雷达数据在pHRI过程中可伴随获取，无需额外成本。尽管此类传感器数据稀疏，但作为人体网格恢复的可靠且富含信息的线索，具有独特价值。在主动测量过程中，相机视角与传感器布局基于估计姿态的不确定性（与截断/遮挡区域密切相关）进行优化。在传感器融合过程中，我们通过最小化估计网格与实测位置的距离，将传感器测量值融合至基于摄像头的姿态估计结果中。本方法不受机器人构型限制。实验采用丰田人类支持机器人（配备摄像头、二维激光雷达及机械臂触觉传感器）进行验证。定量比较结果表明，所提方法在人体姿态估计精度上具有显著优势。此外，在诸如目标人员被毯子遮挡、机械臂辅助站立等实际场景中，该方法仍能可靠估计目标姿态。