Autonomous robotic systems operating in human environments must understand their surroundings to make accurate and safe decisions. In crowded human scenes with close-up human-robot interaction and robot navigation, a deep understanding requires reasoning about human motion and body dynamics over time with human body pose estimation and tracking. However, existing datasets either do not provide pose annotations or include scene types unrelated to robotic applications. Many datasets also lack the diversity of poses and occlusions found in crowded human scenes. To address this limitation we introduce JRDB-Pose, a large-scale dataset and benchmark for multi-person pose estimation and tracking using videos captured from a social navigation robot. The dataset contains challenge scenes with crowded indoor and outdoor locations and a diverse range of scales and occlusion types. JRDB-Pose provides human pose annotations with per-keypoint occlusion labels and track IDs consistent across the scene. A public evaluation server is made available for fair evaluation on a held-out test set. JRDB-Pose is available at https://jrdb.erc.monash.edu/ .
翻译:在人类环境中运行的自主机器人系统必须理解其周围环境,以做出准确且安全的决策。在存在近距离人机交互与机器人导航的拥挤人类场景中,深度理解需要通过对人体姿态估计与跟踪,推理随时间变化的人体运动与身体动力学。然而,现有数据集要么未提供姿态标注,要么包含与机器人应用无关的场景类型。许多数据集还缺乏拥挤人类场景中存在的姿态多样性与遮挡多样性。为解决这一局限性,我们引入了JRDB-Pose——一个基于社交导航机器人拍摄视频的大规模多人姿态估计与跟踪数据集与基准。该数据集包含拥挤室内外位置的挑战性场景,以及多种尺度与遮挡类型。JRDB-Pose提供带有每个关键点遮挡标签且在场景中保持一致的跟踪ID的人体姿态标注。我们提供了一个公开评估服务器,用于对保留测试集进行公平评估。JRDB-Pose可通过https://jrdb.erc.monash.edu/获取。