Action scene understanding in soccer is a challenging task due to the complex and dynamic nature of the game, as well as the interactions between players. This article provides a comprehensive overview of this task divided into action recognition, spotting, and spatio-temporal action localization, with a particular emphasis on the modalities used and multimodal methods. We explore the publicly available data sources and metrics used to evaluate models' performance. The article reviews recent state-of-the-art methods that leverage deep learning techniques and traditional methods. We focus on multimodal methods, which integrate information from multiple sources, such as video and audio data, and also those that represent one source in various ways. The advantages and limitations of methods are discussed, along with their potential for improving the accuracy and robustness of models. Finally, the article highlights some of the open research questions and future directions in the field of soccer action recognition, including the potential for multimodal methods to advance this field. Overall, this survey provides a valuable resource for researchers interested in the field of action scene understanding in soccer.
翻译:足球中的动作场景理解由于比赛的复杂动态性以及球员之间的交互作用而具有挑战性。本文对该任务进行了全面概述,将其分为动作识别、动作检测和时空动作定位三个部分,特别关注所用模态及多模态方法。我们探讨了用于评估模型性能的公开数据源和评价指标。文章回顾了近期基于深度学习技术和传统方法的最新成果。我们重点研究了多模态方法,这些方法整合了多种来源的信息(如视频和音频数据),以及将单一来源以多种方式表示的技术。文章讨论了各种方法的优势与局限性,以及它们在提升模型准确性和鲁棒性方面的潜力。最后,本文强调了足球动作识别领域中一些待解决的研究问题和未来方向,包括多模态方法推动该领域发展的可能性。总体而言,本综述为对足球动作场景理解感兴趣的研究人员提供了宝贵资源。