As of today, state-of-the-art activity recognition from wearable sensors relies on algorithms being trained to classify fixed windows of data. In contrast, video-based Human Activity Recognition, known as Temporal Action Localization (TAL), has followed a segment-based prediction approach, localizing activity segments in a timeline of arbitrary length. This paper is the first to systematically demonstrate the applicability of state-of-the-art TAL models for both offline and near-online Human Activity Recognition (HAR) using raw inertial data as well as pre-extracted latent features as input. Offline prediction results show that TAL models are able to outperform popular inertial models on a multitude of HAR benchmark datasets, with improvements reaching as much as 26% in F1-score. We show that by analyzing timelines as a whole, TAL models can produce more coherent segments and achieve higher NULL-class accuracy across all datasets. We demonstrate that TAL is less suited for the immediate classification of small-sized windows of data, yet offers an interesting perspective on inertial-based HAR -- alleviating the need for fixed-size windows and enabling algorithms to recognize activities of arbitrary length. With design choices and training concepts yet to be explored, we argue that TAL architectures could be of significant value to the inertial-based HAR community. The code and data download to reproduce experiments is publicly available via github.com/mariusbock/tal_for_har.
翻译:目前,基于可穿戴传感器的最先进活动识别方法依赖于训练算法对固定时间窗口的数据进行分类。相比之下,基于视频的人体活动识别(称为时序动作定位)采用了基于片段的预测方法,可在任意长度的时间线上定位活动片段。本文首次系统性地论证了最先进的时序动作定位模型在离线和近在线人体活动识别中的适用性,使用原始惯性数据及预提取的潜在特征作为输入。离线预测结果表明,时序动作定位模型在多个HAR基准数据集上能够超越主流惯性模型,F1分数提升幅度最高可达26%。我们证明,通过对完整时间线进行分析,时序动作定位模型能够生成更连贯的活动片段,并在所有数据集上实现更高的NULL类识别准确率。研究表明,时序动作定位模型虽不适用于小尺寸数据窗口的即时分类,但为基于惯性传感器的HAR提供了新的视角——它消除了对固定尺寸窗口的依赖,使算法能够识别任意时长的活动。鉴于其设计选择与训练理念仍有待探索,我们认为时序动作定位架构对基于惯性传感器的HAR研究领域具有重要价值。实验复现代码与数据可通过github.com/mariusbock/tal_for_har公开获取。