Human activity recognition is a major field of study that employs computer vision, machine vision, and deep learning techniques to categorize human actions. The field of deep learning has made significant progress, with architectures that are extremely effective at capturing human dynamics. This study emphasizes the influence of feature fusion on the accuracy of activity recognition. This technique addresses the limitation of conventional models, which face difficulties in identifying activities because of their limited capacity to understand spatial and temporal features. The technique employs sensory data obtained from four publicly available datasets: HuGaDB, PKU-MMD, LARa, and TUG. The accuracy and F1-score of two deep learning models, specifically a Transformer model and a Parameter-Optimized Graph Convolutional Network (PO-GCN), were evaluated using these datasets. The feature fusion technique integrated the final layer features from both models and inputted them into a classifier. Empirical evidence demonstrates that PO-GCN outperforms standard models in activity recognition. HuGaDB demonstrated a 2.3% improvement in accuracy and a 2.2% increase in F1-score. TUG showed a 5% increase in accuracy and a 0.5% rise in F1-score. On the other hand, LARa and PKU-MMD achieved lower accuracies of 64% and 69% respectively. This indicates that the integration of features enhanced the performance of both the Transformer model and PO-GCN.
翻译:人体活动识别是运用计算机视觉、机器视觉及深度学习技术对人类行为进行分类的重要研究领域。深度学习架构在捕捉人体动态特征方面表现出卓越效能,推动了该领域的显著进展。本研究重点探讨特征融合对活动识别精度的影响。该技术针对传统模型因时空特征理解能力有限而难以识别活动的局限性,通过融合来自四个公开数据集(HuGaDB、PKU-MMD、LARa与TUG)的传感数据进行研究。我们评估了两种深度学习模型——Transformer模型与参数优化图卷积网络(PO-GCN)在这些数据集上的准确率与F1分数。特征融合技术整合了两个模型最后一层的特征并输入分类器。实验结果表明:PO-GCN在活动识别任务中优于标准模型,其中HuGaDB数据集准确率提升2.3%、F1分数提高2.2%;TUG数据集准确率提升5%、F1分数提高0.5%。而LARa与PKU-MMD数据集分别获得64%与69%的较低准确率。研究表明,特征融合有效提升了Transformer模型与PO-GCN的整体性能。