In order to predict a pedestrian's trajectory in a crowd accurately, one has to take into account her/his underlying socio-temporal interactions with other pedestrians consistently. Unlike existing work that represents the relevant information separately, partially, or implicitly, we propose a complete representation for it to be fully and explicitly captured and analyzed. In particular, we introduce a Directed Acyclic Graph-based structure, which we term Socio-Temporal Graph (STG), to explicitly capture pair-wise socio-temporal interactions among a group of people across both space and time. Our model is built on a time-varying generative process, whose latent variables determine the structure of the STGs. We design an attention-based model named STGformer that affords an end-to-end pipeline to learn the structure of the STGs for trajectory prediction. Our solution achieves overall state-of-the-art prediction accuracy in two large-scale benchmark datasets. Our analysis shows that a person's past trajectory is critical for predicting another person's future path. Our model learns this relationship with a strong notion of socio-temporal localities. Statistics show that utilizing this information explicitly for prediction yields a noticeable performance gain with respect to the trajectory-only approaches.
翻译:为准确预测行人在人群中的轨迹,必须持续考虑其与其他行人之间的潜在社会-时间交互。不同于现有研究将相关信息分别、部分或隐式地表示,我们提出一种完整表示方法,使这些信息能够被充分显式捕获和分析。具体而言,我们引入一种基于有向无环图的结构,称为社会-时间图(STG),以显式捕获跨空间与时间的人群中成对社会-时间交互。我们的模型基于时变生成过程,其潜在变量决定STG的结构。我们设计了一种名为STGformer的注意力模型,提供端到端流水线来学习STG结构以进行轨迹预测。我们的解决方案在两个大规模基准数据集上实现了整体最优的预测精度。分析表明,个体的历史轨迹对于预测另一人的未来路径至关重要。我们的模型通过学习这种关系,具有强烈的社会-时间局部性概念。统计表明,相较于仅基于轨迹的方法,显式利用该信息进行预测可带来显著的性能提升。