This paper proposes Group Activity Feature (GAF) learning in which features of multi-person activity are learned as a compact latent vector. Unlike prior work in which the manual annotation of group activities is required for supervised learning, our method learns the GAF through person attribute prediction without group activity annotations. By learning the whole network in an end-to-end manner so that the GAF is required for predicting the person attributes of people in a group, the GAF is trained as the features of multi-person activity. As a person attribute, we propose to use a person's action class and appearance features because the former is easy to annotate due to its simpleness, and the latter requires no manual annotation. In addition, we introduce a location-guided attribute prediction to disentangle the complex GAF for extracting the features of each target person properly. Various experimental results validate that our method outperforms SOTA methods quantitatively and qualitatively on two public datasets. Visualization of our GAF also demonstrates that our method learns the GAF representing fined-grained group activity classes. Code: https://github.com/chihina/GAFL-CVPR2024.
翻译:本文提出群体活动特征学习(Group Activity Feature, GAF),将多人活动特征学习为紧凑的潜在向量。不同于以往工作中需要手动标注群体活动进行监督学习的方法,本方法通过人物属性预测学习GAF,无需群体活动标注。通过端到端方式训练整个网络,使得GAF成为预测群体中人物属性的必要特征,从而将GAF训练为多人活动的特征。在人物属性方面,本文提出采用人物的动作类别与外观特征,前者因简单性而易于标注,后者则无需人工标注。此外,我们引入位置引导的属性预测机制,以解耦复杂的GAF,从而正确提取每个目标人物的特征。多项实验结果表明,本方法在两个公开数据集上的定量与定性性能均优于现有最先进方法。对GAF的可视化分析亦证明,本方法学习的GAF能够表征细粒度的群体活动类别。代码:https://github.com/chihina/GAFL-CVPR2024。