Human pose and shape estimation (HPS) has attracted increasing attention in recent years. While most existing studies focus on HPS from 2D images or videos with inherent depth ambiguity, there are surging need to investigate HPS from 3D point clouds as depth sensors have been frequently employed in commercial devices. However, real-world sensory 3D points are usually noisy and incomplete, and also human bodies could have different poses of high diversity. To tackle these challenges, we propose a principled framework, PointHPS, for accurate 3D HPS from point clouds captured in real-world settings, which iteratively refines point features through a cascaded architecture. Specifically, each stage of PointHPS performs a series of downsampling and upsampling operations to extract and collate both local and global cues, which are further enhanced by two novel modules: 1) Cross-stage Feature Fusion (CFF) for multi-scale feature propagation that allows information to flow effectively through the stages, and 2) Intermediate Feature Enhancement (IFE) for body-aware feature aggregation that improves feature quality after each stage. To facilitate a comprehensive study under various scenarios, we conduct our experiments on two large-scale benchmarks, comprising i) a dataset that features diverse subjects and actions captured by real commercial sensors in a laboratory environment, and ii) controlled synthetic data generated with realistic considerations such as clothed humans in crowded outdoor scenes. Extensive experiments demonstrate that PointHPS, with its powerful point feature extraction and processing scheme, outperforms State-of-the-Art methods by significant margins across the board. Homepage: https://caizhongang.github.io/projects/PointHPS/.
翻译:人体姿态与形状估计(HPS)近年来受到日益广泛的关注。尽管现有研究多聚焦于存在固有深度模糊性的二维图像或视频中的HPS,但由于深度传感器已广泛应用于商业设备,基于三维点云的HPS研究需求正迅速增长。然而,真实场景中的三维点云通常存在噪声且不完整,同时人体姿态具有高度多样性。为应对这些挑战,我们提出了一套严谨的框架——PointHPS,用于从真实场景采集的点云中实现精准的三维HPS,该框架通过级联架构迭代优化点特征。具体而言,PointHPS的每个阶段执行一系列下采样与上采样操作,以提取并整合局部与全局线索;这些线索进一步通过两个创新模块增强:1)跨阶段特征融合(CFF)实现多尺度特征传播,使信息在各阶段间高效流动;2)中间特征增强(IFE)进行身体感知特征聚合,提升每个阶段后的特征质量。为便于在不同场景下进行全面研究,我们在两个大规模基准上开展实验,包括:i)由真实商业传感器在实验室环境中采集、包含多样受试者与动作的数据集;ii)基于实际考虑(如拥挤室外场景中的着装人体)生成的受控合成数据。大量实验表明,PointHPS凭借其强大的点特征提取与处理方案,在各项指标上均显著优于现有最优方法。主页:https://caizhongang.github.io/projects/PointHPS/。