In this technical report, we present the 1st place solution for the 2023 Waymo Open Dataset Pose Estimation challenge. Due to the difficulty of acquiring large-scale 3D human keypoint annotation, previous methods have commonly relied on 2D image features and 2D sequential annotations for 3D human pose estimation. In contrast, our proposed method, named LPFormer, uses only LiDAR as its input along with its corresponding 3D annotations. LPFormer consists of two stages: the first stage detects the human bounding box and extracts multi-level feature representations, while the second stage employs a transformer-based network to regress the human keypoints using these features. Experimental results on the Waymo Open Dataset demonstrate the top performance, and improvements even compared to previous multi-modal solutions.
翻译:在本技术报告中,我们提出了2023年Waymo开放数据集姿态估计挑战赛的第一名解决方案。由于大规模三维人体关键点标注获取困难,以往方法通常依赖二维图像特征和二维时序标注进行三维人体姿态估计。相比之下,我们提出的LPFormer方法仅以激光雷达及其对应的三维标注作为输入。LPFormer包含两个阶段:第一阶段检测人体边界框并提取多层次特征表示,第二阶段采用基于Transformer的网络利用这些特征回归人体关键点。在Waymo开放数据集上的实验结果表明,该方法取得了最优性能,甚至相比以往的多模态解决方案也有显著改进。