3D Human Mesh Reconstruction (HMR) from 2D RGB images faces challenges in environments with poor lighting, privacy concerns, or occlusions. These weaknesses of RGB imaging can be complemented by acoustic signals, which are widely available, easy to deploy, and capable of penetrating obstacles. However, no existing methods effectively combine acoustic signals with RGB data for robust 3D HMR. The primary challenges include the low-resolution images generated by acoustic signals and the lack of dedicated processing backbones. We introduce SonicMesh, a novel approach combining acoustic signals with RGB images to reconstruct 3D human mesh. To address the challenges of low resolution and the absence of dedicated processing backbones in images generated by acoustic signals, we modify an existing method, HRNet, for effective feature extraction. We also integrate a universal feature embedding technique to enhance the precision of cross-dimensional feature alignment, enabling SonicMesh to achieve high accuracy. Experimental results demonstrate that SonicMesh accurately reconstructs 3D human mesh in challenging environments such as occlusions, non-line-of-sight scenarios, and poor lighting.
翻译:基于二维RGB图像的三维人体网格重建在光照条件不佳、存在隐私顾虑或遮挡严重的环境中面临挑战。RGB成像的这些缺陷可以通过声学信号加以弥补,因为声学信号具有广泛可用性、易于部署且能够穿透障碍物的优势。然而,现有方法均未能有效结合声学信号与RGB数据来实现鲁棒的三维人体网格重建。主要挑战包括声学信号生成的图像分辨率较低,以及缺乏专用的处理骨干网络。本文提出SonicMesh——一种融合声学信号与RGB图像进行三维人体网格重建的新方法。针对声学信号生成图像分辨率低且缺乏专用处理骨干网络的问题,我们改进现有方法HRNet以实现有效的特征提取。同时引入通用特征嵌入技术以提升跨维度特征对齐的精度,使SonicMesh能够实现高精度重建。实验结果表明,SonicMesh在遮挡、非视距场景及光照不良等挑战性环境中均能准确重建三维人体网格。