Anticipating the motion of all humans in dynamic environments such as homes and offices is critical to enable safe and effective robot navigation. Such spaces remain challenging as humans do not follow strict rules of motion and there are often multiple occluded entry points such as corners and doors that create opportunities for sudden encounters. In this work, we present a Transformer based architecture to predict human future trajectories in human-centric environments from input features including human positions, head orientations, and 3D skeletal keypoints from onboard in-the-wild sensory information. The resulting model captures the inherent uncertainty for future human trajectory prediction and achieves state-of-the-art performance on common prediction benchmarks and a human tracking dataset captured from a mobile robot adapted for the prediction task. Furthermore, we identify new agents with limited historical data as a major contributor to error and demonstrate the complementary nature of 3D skeletal poses in reducing prediction error in such challenging scenarios.
翻译:预测动态环境(如家庭和办公室)中所有人体的运动轨迹,对于实现安全有效的机器人导航至关重要。此类空间仍然具有挑战性,因为人体运动并不遵循严格的规则,且存在多个被遮挡的入口点(如拐角和门),这些入口点可能导致突然相遇。在本研究中,我们提出了一种基于Transformer的架构,用于从人体位置、头部朝向以及来自车载真实世界感知信息的3D骨骼关键点等输入特征,预测以人为中心的环境中的人体未来轨迹。所得模型捕获了未来人体轨迹预测中固有的不确定性,并在常见预测基准以及从适用于预测任务的移动机器人捕获的人体跟踪数据集上达到了最先进性能。此外,我们将历史数据有限的新智能体识别为误差的主要来源,并展示了3D骨骼姿态在减少此类挑战性场景中预测误差方面的互补特性。