Deploying service robots in our daily life, whether in restaurants, warehouses or hospitals, calls for the need to reason on the interactions happening in dense and dynamic scenes. In this paper, we present and benchmark three new approaches to model and predict multi-agent interactions in dense scenes, including the use of an intuitive qualitative representation. The proposed solutions take into account static and dynamic context to predict individual interactions. They exploit an input- and a temporal-attention mechanism, and are tested on medium and long-term time horizons. The first two approaches integrate different relations from the so-called Qualitative Trajectory Calculus (QTC) within a state-of-the-art deep neural network to create a symbol-driven neural architecture for predicting spatial interactions. The third approach implements a purely data-driven network for motion prediction, the output of which is post-processed to predict QTC spatial interactions. Experimental results on a popular robot dataset of challenging crowded scenarios show that the purely data-driven prediction approach generally outperforms the other two. The three approaches were further evaluated on a different but related human scenarios to assess their generalisation capability.
翻译:在日常生活场景(如餐厅、仓库或医院)中部署服务机器人,需要能够推理密集动态场景中发生的交互行为。本文提出并对三种新方法进行了基准测试,用于建模和预测密集场景中的多智能体交互,其中包括使用一种直观的定性表示方法。所提出的解决方案综合考虑静态与动态上下文以预测个体交互,它们利用输入注意力机制和时间注意力机制,并在中长程时间范围内进行了测试。前两种方法将所谓的定性轨迹微积分(QTC)中的不同关系整合到最新的深度神经网络中,构建了一种符号驱动的神经架构来预测空间交互。第三种方法实现了一种纯数据驱动的运动预测网络,其输出经过后处理以预测QTC空间交互。在包含挑战性拥挤场景的流行机器人数据集上的实验结果表明,纯数据驱动的预测方法通常优于其他两种方法。为评估泛化能力,这三种方法还在不同但相关的人类场景上进行了进一步评估。