Predicting crowded intents and trajectories is crucial in varouls real-world applications, including service robots and autonomous vehicles. Understanding environmental dynamics is challenging, not only due to the complexities of modeling pair-wise spatial and temporal interactions but also the diverse influence of group-wise interactions. To decode the comprehensive pair-wise and group-wise interactions in crowded scenarios, we introduce Hyper-STTN, a Hypergraph-based Spatial-Temporal Transformer Network for crowd trajectory prediction. In Hyper-STTN, crowded group-wise correlations are constructed using a set of multi-scale hypergraphs with varying group sizes, captured through random-walk robability-based hypergraph spectral convolution. Additionally, a spatial-temporal transformer is adapted to capture pedestrians' pair-wise latent interactions in spatial-temporal dimensions. These heterogeneous group-wise and pair-wise are then fused and aligned though a multimodal transformer network. Hyper-STTN outperformes other state-of-the-art baselines and ablation models on 5 real-world pedestrian motion datasets.
翻译:预测拥挤场景中的意图和轨迹在服务机器人和自动驾驶等实际应用中至关重要。理解环境动态具有挑战性,不仅因为需要建模成对的空间和时间交互的复杂性,还由于群体交互的多样化影响。为解码拥挤场景中全面的成对与群体交互,我们提出了Hyper-STTN——一种基于超图的时空Transformer网络用于人群轨迹预测。在Hyper-STTN中,通过基于随机游走概率的超图谱卷积,利用一组具有不同群体规模的多尺度超图构建拥挤场景中的群体关联。此外,采用时空Transformer捕捉行人在时空维度上的成对潜在交互。这些异质的群体与成对交互通过多模态Transformer网络进行融合与对齐。Hyper-STTN在5个真实世界行人运动数据集上优于其他最先进的基线模型与消融模型。