Effective interaction modeling and behavior prediction of dynamic agents play a significant role in interactive motion planning for autonomous robots. Although existing methods have improved prediction accuracy, few research efforts have been devoted to enhancing prediction model interpretability and out-of-distribution (OOD) generalizability. This work addresses these two challenging aspects by designing a variational auto-encoder framework that integrates graph-based representations and time-sequence models to efficiently capture spatio-temporal relations between interactive agents and predict their dynamics. Our model infers dynamic interaction graphs in a latent space augmented with interpretable edge features that characterize the interactions. Moreover, we aim to enhance model interpretability and performance in OOD scenarios by disentangling the latent space of edge features, thereby strengthening model versatility and robustness. We validate our approach through extensive experiments on both simulated and real-world datasets. The results show superior performance compared to existing methods in modeling spatio-temporal relations, motion prediction, and identifying time-invariant latent features.
翻译:动态智能体的有效交互建模与行为预测在自主机器人的交互式运动规划中扮演着关键角色。尽管现有方法已提升了预测精度,但鲜有研究致力于增强预测模型的可解释性与分布外(OOD)泛化能力。本文通过设计一个变分自编码器框架来解决这两个具有挑战性的问题,该框架整合了基于图的表示与时间序列模型,以高效捕捉交互智能体之间的时空关系并预测其动力学行为。我们的模型在潜在空间中推断动态交互图,并辅以表征交互的可解释边特征。此外,我们通过解耦边特征的潜在空间来增强模型在OOD场景下的可解释性与性能,从而提升模型的通用性与鲁棒性。我们通过在模拟和真实数据集上的大量实验验证了该方法。结果表明,在建模时空关系、运动预测及识别时间不变潜在特征方面,该方法相较于现有方法展现出了更优的性能。