Intention recognition has traditionally focused on individual intentions, overlooking the complexities of collective intentions in group settings. To address this limitation, we introduce the concept of group intention, which represents shared goals emerging through the actions of multiple individuals, and Group Intention Forecasting (GIF), a novel task that forecasts when group intentions will occur by analyzing individual actions and interactions before the collective goal becomes apparent. To investigate GIF in a specific scenario, we propose SHOT, the first large-scale dataset for GIF, consisting of 1,979 basketball video clips captured from 5 camera views and annotated with 6 types of individual attributes. SHOT is designed with 3 key characteristics: multi-individual information, multi-view adaptability, and multi-level intention, making it well-suited for studying emerging group intentions. Furthermore, we introduce GIFT (Group Intention ForecasTer), a framework that extracts fine-grained individual features and models evolving group dynamics to forecast intention emergence. Experimental results confirm the effectiveness of SHOT and GIFT, establishing a strong foundation for future research in group intention forecasting. The dataset is available at https://xinyi-hu.github.io/SHOT_DATASET.
翻译:意图识别传统上侧重于个体意图,忽视了群体环境中集体意图的复杂性。为弥补这一不足,我们引入了群体意图的概念,它代表了通过多个个体的行动而出现的共同目标,以及群体意图预测这一新颖任务,该任务通过分析集体目标显现之前的个体行动与交互,来预测群体意图何时发生。为在特定场景中研究群体意图预测,我们提出了SHOT——首个用于群体意图预测的大规模数据集,包含从5个摄像机视角采集的1,979个篮球视频片段,并标注了6类个体属性。SHOT的设计具有三个关键特征:多个体信息、多视角适应性和多层次意图,使其非常适合研究正在形成的群体意图。此外,我们提出了GIFT框架,该框架提取细粒度的个体特征并建模不断演化的群体动态,以预测意图的出现。实验结果证实了SHOT和GIFT的有效性,为未来群体意图预测研究奠定了坚实基础。数据集可通过 https://xinyi-hu.github.io/SHOT_DATASET 获取。