Today, there are standard and well established procedures within the Human Activity Recognition (HAR) pipeline. However, some of these conventional approaches lead to accuracy overestimation. In particular, sliding windows for data segmentation followed by standard random k-fold cross validation, produce biased results. An analysis of previous literature and present-day studies, surprisingly, shows that these are common approaches in state-of-the-art studies on HAR. It is important to raise awareness in the scientific community about this problem, whose negative effects are being overlooked. Otherwise, publications of biased results lead to papers that report lower accuracies, with correct unbiased methods, harder to publish. Several experiments with different types of datasets and different types of classification models allow us to exhibit the problem and show it persists independently of the method or dataset.
翻译:如今,人体活动识别(HAR)流程中已存在标准且成熟的操作流程。然而,部分传统方法会导致准确率高估。特别是采用滑动窗口进行数据分割后,再使用标准随机k折交叉验证,会产生偏差结果。令人惊讶的是,对既往文献和当前研究的分析表明,这些方法在HAR前沿研究中仍是常见做法。亟需提高科学界对这一被忽视的负面影响的认知。否则,偏差结果的发表将导致采用正确无偏方法却报告较低准确率的论文更难发表。通过使用不同类型数据集和分类模型进行的多项实验,我们揭示了该问题,并证明其独立于方法或数据集而持续存在。