An Examination of Wearable Sensors and Video Data Capture for Human Exercise Classification

Wearable sensors such as Inertial Measurement Units (IMUs) are often used to assess the performance of human exercise. Common approaches use handcrafted features based on domain expertise or automatically extracted features using time series analysis. Multiple sensors are required to achieve high classification accuracy, which is not very practical. These sensors require calibration and synchronization and may lead to discomfort over longer time periods. Recent work utilizing computer vision techniques has shown similar performance using video, without the need for manual feature engineering, and avoiding some pitfalls such as sensor calibration and placement on the body. In this paper, we compare the performance of IMUs to a video-based approach for human exercise classification on two real-world datasets consisting of Military Press and Rowing exercises. We compare the performance using a single camera that captures video in the frontal view versus using 5 IMUs placed on different parts of the body. We observe that an approach based on a single camera can outperform a single IMU by 10 percentage points on average. Additionally, a minimum of 3 IMUs are required to outperform a single camera. We observe that working with the raw data using multivariate time series classifiers outperforms traditional approaches based on handcrafted or automatically extracted features. Finally, we show that an ensemble model combining the data from a single camera with a single IMU outperforms either data modality. Our work opens up new and more realistic avenues for this application, where a video captured using a readily available smartphone camera, combined with a single sensor, can be used for effective human exercise classification.

翻译：惯性测量单元（IMU）等可穿戴传感器常被用于评估人体运动表现。传统方法通常基于领域专业知识设计手工特征，或通过时间序列分析自动提取特征。实现高分类精度需要部署多个传感器，但这一做法实用性有限：这些传感器需要校准与同步，且长时间佩戴可能导致不适。近期基于计算机视觉技术的研究表明，利用视频即可获得与传感器相近的分类性能，既无需手工特征工程，也避免了传感器校准与身体贴附等缺陷。本文在军事推举和划船两类真实运动数据集上，对比了IMU与基于视频的方法在人体运动分类中的性能。我们比较了单一正面视角摄像头与分布于身体各部位的5个IMU的分类效果，发现单摄像头方案平均准确率比单个IMU高出10个百分点；同时，至少需要3个IMU才能超越单摄像头的性能。实验表明，使用多变量时间序列分类器处理原始数据的方法，其效果优于基于手工或自动提取特征的传统方案。此外，融合单摄像头与单个IMU数据的集成模型，其性能优于任一单一数据模态。本研究为该应用场景开辟了更切实可行的新路径——利用智能手机摄像头拍摄的视频配合单个传感器，即可实现高效的人体运动分类。