Predicting human trajectory is crucial for social robot navigation in crowded environments. While most existing approaches treat human as point mass, we present a study on multi-agent trajectory prediction that leverages different human skeletal features for improved forecast accuracy. In particular, we systematically evaluate the predictive utility of 2D and 3D skeletal keypoints and derived biomechanical cues as additional inputs. Through a comprehensive study on the JRDB dataset and another new dataset for social navigation with 360-degree panoramic videos, we find that focusing on lower-body 3D keypoints yields a 13% reduction in Average Displacement Error and augmenting 3D keypoint inputs with corresponding biomechanical cues provides a further 1-4% improvement. Notably, the performance gain persists when using 2D keypoint inputs extracted from equirectangular panoramic images, indicating that monocular surround vision can capture informative cues for motion forecasting. Our finding that robots can forecast human movement efficiently by watching their legs provides actionable insights for designing sensing capabilities for social robot navigation.
翻译:预测人类轨迹对于社交机器人在拥挤环境中的导航至关重要。虽然现有方法大多将人视为质点,但本研究提出了一种多智能体轨迹预测方法,该方法利用不同的人体骨骼特征以提高预测精度。具体而言,我们系统性地评估了二维和三维骨骼关键点以及衍生的生物力学线索作为额外输入的预测效用。通过在JRDB数据集以及另一个用于社交导航的360度全景视频新数据集上进行全面研究,我们发现,专注于下半身三维关键点可使平均位移误差降低13%,而在三维关键点输入基础上增加相应的生物力学线索可进一步带来1-4%的性能提升。值得注意的是,当使用从等距柱状全景图像中提取的二维关键点作为输入时,性能增益依然存在,这表明单目环绕视觉能够捕捉到用于运动预测的信息线索。我们的发现——机器人通过观察人的腿部即可高效预测其运动——为社交机器人导航的传感能力设计提供了可行的见解。