Recognizing human actions in video sequences, known as Human Action Recognition (HAR), is a challenging task in pattern recognition. While Convolutional Neural Networks (ConvNets) have shown remarkable success in image recognition, they are not always directly applicable to HAR, as temporal features are critical for accurate classification. In this paper, we propose a novel dynamic PSO-ConvNet model for learning actions in videos, building on our recent work in image recognition. Our approach leverages a framework where the weight vector of each neural network represents the position of a particle in phase space, and particles share their current weight vectors and gradient estimates of the Loss function. To extend our approach to video, we integrate ConvNets with state-of-the-art temporal methods such as Transformer and Recurrent Neural Networks. Our experimental results on the UCF-101 dataset demonstrate substantial improvements of up to 9% in accuracy, which confirms the effectiveness of our proposed method. In addition, we conducted experiments on larger and more variety of datasets including Kinetics-400 and HMDB-51 and obtained preference for Collaborative Learning in comparison with Non-Collaborative Learning (Individual Learning). Overall, our dynamic PSO-ConvNet model provides a promising direction for improving HAR by better capturing the spatio-temporal dynamics of human actions in videos. The code is available at https://github.com/leonlha/Video-Action-Recognition-Collaborative-Learning-with-Dynamics-via-PSO-ConvNet-Transformer.
翻译:视频序列中的人类动作识别(HAR)是模式识别领域的一项挑战性任务。尽管卷积神经网络在图像识别中取得了显著成功,但由于时序特征对精准分类至关重要,其并不总能直接适用于HAR。本文基于我们近期在图像识别方面的研究成果,提出了一种新颖的动态粒子群优化卷积网络模型,用于学习视频中的动作。该方法采用一个框架,其中每个神经网络的权重向量代表相空间中的一个粒子位置,而粒子间共享其当前权重向量与损失函数的梯度估计值。为将方法扩展至视频领域,我们将卷积网络与Transformer及循环神经网络等前沿时序方法相结合。在UCF-101数据集上的实验结果表明,准确率最高提升9%,验证了所提方法的有效性。此外,我们在Kinetics-400和HMDB-51等更大规模、更多样化的数据集上开展实验,发现协同学习相较于非协同学习(个体学习)更具优势。总体而言,我们的动态粒子群优化卷积网络模型通过更精准地捕捉视频中人类动作的时空动力学特性,为改进人类动作识别提供了有前景的方向。代码开源地址:https://github.com/leonlha/Video-Action-Recognition-Collaborative-Learning-with-Dynamics-via-PSO-ConvNet-Transformer