The objective of the panoramic activity recognition task is to identify behaviors at various granularities within crowded and complex environments, encompassing individual actions, social group activities, and global activities. Existing methods generally use either parameter-independent modules to capture task-specific features or parameter-sharing modules to obtain common features across all tasks. However, there is often a strong interrelatedness and complementary effect between tasks of different granularities that previous methods have yet to notice. In this paper, we propose a model called MPT-PAR that considers both the unique characteristics of each task and the synergies between different tasks simultaneously, thereby maximizing the utilization of features across multi-granularity activity recognition. Furthermore, we emphasize the significance of temporal and spatial information by introducing a spatio-temporal relation-enhanced module and a scene representation learning module, which integrate the the spatio-temporal context of action and global scene into the feature map of each granularity. Our method achieved an overall F1 score of 47.5\% on the JRDB-PAR dataset, significantly outperforming all the state-of-the-art methods.
翻译:全景活动识别任务的目标是在拥挤复杂的环境中识别不同粒度下的行为,包括个体动作、社交群体活动以及全局活动。现有方法通常采用参数独立的模块来捕获任务特定特征,或采用参数共享模块来获取所有任务间的共同特征。然而,不同粒度任务之间往往存在较强的相互关联性与互补效应,而先前方法尚未充分关注这一点。本文提出了一种称为MPT-PAR的模型,该模型同时考虑了每个任务的独特特性以及不同任务之间的协同作用,从而在多粒度活动识别中最大化特征的利用率。此外,我们通过引入时空关系增强模块和场景表示学习模块,强调了时空信息的重要性,这些模块将动作的时空上下文和全局场景信息整合到每个粒度的特征图中。我们的方法在JRDB-PAR数据集上取得了47.5%的整体F1分数,显著优于所有现有最先进方法。