Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis

Human body-pose estimation is a complex problem in computer vision. Recent research interests have been widened specifically on the Sports, Yoga, and Dance (SYD) postures for maintaining health conditions. The SYD pose categories are regarded as a fine-grained image classification task due to the complex movement of body parts. Deep Convolutional Neural Networks (CNNs) have attained significantly improved performance in solving various human body-pose estimation problems. Though decent progress has been achieved in yoga postures recognition using deep learning techniques, fine-grained sports, and dance recognition necessitates ample research attention. However, no benchmark public image dataset with sufficient inter-class and intra-class variations is available yet to address sports and dance postures classification. To solve this limitation, we have proposed two image datasets, one for 102 sport categories and another for 12 dance styles. Two public datasets, Yoga-82 which contains 82 classes and Yoga-107 represents 107 classes are collected for yoga postures. These four SYD datasets are experimented with the proposed deep model, SYD-Net, which integrates a patch-based attention (PbA) mechanism on top of standard backbone CNNs. The PbA module leverages the self-attention mechanism that learns contextual information from a set of uniform and multi-scale patches and emphasizes discriminative features to understand the semantic correlation among patches. Moreover, random erasing data augmentation is applied to improve performance. The proposed SYD-Net has achieved state-of-the-art accuracy on Yoga-82 using five base CNNs. SYD-Net's accuracy on other datasets is remarkable, implying its efficiency. Our Sports-102 and Dance-12 datasets are publicly available at https://sites.google.com/view/syd-net/home.

翻译：摘要：人体姿态估计是计算机视觉中的一个复杂问题。近年来，研究兴趣已特别扩展至体育、瑜伽和舞蹈（SYD）姿态，以维护健康状况。由于身体部位的复杂运动，SYD姿态类别被视为细粒度图像分类任务。深度卷积神经网络（CNNs）在解决各类人体姿态估计问题中已取得显著性能提升。尽管利用深度学习技术在瑜伽姿态识别方面取得了可观进展，但细粒度体育与舞蹈识别仍需充分的研究关注。然而，目前尚无具备足够类间与类内差异性的公开基准图像数据集可用于解决体育与舞蹈姿态分类问题。为解决这一局限，我们提出了两个图像数据集：一个包含102个体育类别，另一个包含12种舞蹈风格。同时收集了瑜伽姿态的两个公开数据集：Yoga-82（含82个类别）和Yoga-107（含107个类别）。我们通过提出的深度模型SYD-Net对这四组SYD数据集进行了实验，该模型在标准骨干CNN之上集成了基于块的注意力（PbA）机制。PbA模块利用自注意力机制，从一组均匀和多尺度块中学习上下文信息，并强调判别性特征以理解块间的语义相关性。此外，应用随机擦除数据增强技术以提升性能。所提出的SYD-Net在使用五个基础CNN的Yoga-82数据集上达到了当前最优准确率。SYD-Net在其他数据集上的准确率同样显著，体现了其高效性。我们的Sports-102和Dance-12数据集已在https://sites.google.com/view/syd-net/home 公开提供。