Understanding and predicting human actions has been a long-standing challenge and is a crucial measure of perception in robotics AI. While significant progress has been made in anticipating the future actions of individual agents, prior work has largely overlooked a key aspect of real-world human activity -- interactions. To address this gap in human-like forecasting within multi-agent environments, we present the Hierarchical Memory-Aware Transformer (HiMemFormer), a transformer-based model for online multi-agent action anticipation. HiMemFormer integrates and distributes global memory that captures joint historical information across all agents through a transformer framework, with a hierarchical local memory decoder that interprets agent-specific features based on these global representations using a coarse-to-fine strategy. In contrast to previous approaches, HiMemFormer uniquely hierarchically applies the global context with agent-specific preferences to avoid noisy or redundant information in multi-agent action anticipation. Extensive experiments on various multi-agent scenarios demonstrate the significant performance of HiMemFormer, compared with other state-of-the-art methods.
翻译:理解与预测人类行为长期以来一直是一项重要挑战,也是衡量机器人AI感知能力的关键指标。尽管在预测单个智能体未来行为方面已取得显著进展,但先前研究大多忽略了现实世界人类活动的一个关键方面——交互。为填补多智能体环境中类人预测的空白,本文提出分层记忆感知Transformer(HiMemFormer),这是一种基于Transformer的在线多智能体行为预测模型。HiMemFormer通过Transformer框架整合并分配捕获所有智能体联合历史信息的全局记忆,同时采用分层局部记忆解码器,通过从粗到细的策略基于这些全局表征解析特定智能体特征。与先前方法不同,HiMemFormer创新性地分层应用全局上下文与智能体特定偏好,以避免多智能体行为预测中的噪声或冗余信息。在多类多智能体场景中的大量实验表明,相较于其他先进方法,HiMemFormer具有显著的性能优势。