Existing activity tracker datasets for human activity recognition are typically obtained by having participants perform predefined activities in an enclosed environment under supervision. This results in small datasets with a limited number of activities and heterogeneity, lacking the mixed and nuanced movements normally found in free-living scenarios. As such, models trained on laboratory-style datasets may not generalise out of sample. To address this problem, we introduce a new dataset involving wrist-worn accelerometers, wearable cameras, and sleep diaries, enabling data collection for over 24 hours in a free-living setting. The result is CAPTURE-24, a large activity tracker dataset collected in the wild from 151 participants, amounting to 3883 hours of accelerometer data, of which 2562 hours are annotated. CAPTURE-24 is two to three orders of magnitude larger than existing publicly available datasets, which is critical to developing accurate human activity recognition models.
翻译:现有的人类活动识别活动追踪器数据集通常通过让参与者在受控的封闭环境中执行预设活动来获取,这导致数据集规模小、活动类型有限且异质性不足,缺乏自由生活场景中常见的混合性与细微动作差异。因此,基于实验室风格数据集训练的模型可能难以泛化到样本外场景。为解决这一问题,我们引入了一个新数据集,整合了腕戴式加速度计、可穿戴相机和睡眠日记,支持在自由生活环境中采集超过24小时的数据。最终形成的CAPTURE-24是一个由151名参与者在野外条件下收集的大规模活动追踪器数据集,共包含3883小时加速度计数据,其中2562小时已标注。CAPTURE-24的规模比现有公开数据集大两到三个数量级,这对开发精准的人类活动识别模型至关重要。