HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition

Skeleton-based action recognition has gained considerable traction thanks to its utilization of succinct and robust skeletal representations. Nonetheless, current methodologies often lean towards utilizing a solitary backbone to model skeleton modality, which can be limited by inherent flaws in the network backbone. To address this and fully leverage the complementary characteristics of various network architectures, we propose a novel Hybrid Dual-Branch Network (HDBN) for robust skeleton-based action recognition, which benefits from the graph convolutional network's proficiency in handling graph-structured data and the powerful modeling capabilities of Transformers for global information. In detail, our proposed HDBN is divided into two trunk branches: MixGCN and MixFormer. The two branches utilize GCNs and Transformers to model both 2D and 3D skeletal modalities respectively. Our proposed HDBN emerged as one of the top solutions in the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) of 2024 ICME Grand Challenge, achieving accuracies of 47.95% and 75.36% on two benchmarks of the UAV-Human dataset by outperforming most existing methods. Our code will be publicly available at: https://github.com/liujf69/ICMEW2024-Track10.

翻译：基于骨架的动作识别因其利用简洁且鲁棒的骨架表示而受到广泛关注。然而，当前方法往往倾向于采用单一骨干网络对骨架模态进行建模，这容易受限于网络骨干自身固有的缺陷。为克服这一问题并充分利用不同网络架构的互补特性，我们提出了一种新颖的混合双分支网络（HDBN），专用于鲁棒的骨架动作识别。该网络融合了图卷积网络在处理图结构数据方面的专长，以及Transformer在全局信息建模上的强大能力。具体而言，我们提出的HDBN包含两个主干分支：MixGCN和MixFormer。这两个分支分别利用GCN和Transformer对2D和3D骨架模态进行建模。在2024年ICME国际大挑战赛的多模态视频推理与分析竞赛（MMVRAC）中，所提出的HDBN脱颖而出，成为顶尖解决方案之一，在UAV-Human数据集的两个基准上分别取得47.95%和75.36%的准确率，超越了现有大多数方法。我们的代码将开源在：https://github.com/liujf69/ICMEW2024-Track10。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日