Infant sleep is critical to brain and behavioral development. Prior studies on infant sleep/wake classification have been largely limited to reliance on expensive and burdensome polysomnography (PSG) tests in the laboratory or wearable devices that collect single-modality data. To facilitate data collection and accuracy of detection, we aimed to advance this field of study by using a multi-modal wearable device, LittleBeats (LB), to collect audio, electrocardiogram (ECG), and inertial measurement unit (IMU) data among a cohort of 28 infants. We employed a 3-branch (audio/ECG/IMU) large scale transformer-based neural network (NN) to demonstrate the potential of such multi-modal data. We pretrained each branch independently with its respective modality, then finetuned the model by fusing the pretrained transformer layers with cross-attention. We show that multi-modal data significantly improves sleep/wake classification (accuracy = 0.880), compared with use of a single modality (accuracy = 0.732). Our approach to multi-modal mid-level fusion may be adaptable to a diverse range of architectures and tasks, expanding future directions of infant behavioral research.
翻译:婴儿睡眠对大脑与行为发育至关重要。此前关于婴儿睡眠/觉醒分类的研究,多局限于实验室中昂贵且繁复的多导睡眠图(PSG)检测,或依赖收集单模态数据的可穿戴设备。为促进数据采集效率与检测准确性,我们采用多模态可穿戴设备LittleBeats(LB),从28名婴儿队列中同步采集音频、心电图(ECG)及惯性测量单元(IMU)数据,以推动该领域研究。我们构建了一个基于大规模Transformer的三分支神经网络(音频/ECG/IMU),验证此类多模态数据的潜力。每个分支独立使用对应模态进行预训练,随后通过交叉注意力融合预训练Transformer层,对模型进行微调。结果表明,与单模态方法(准确率=0.732)相比,多模态数据显著提升了睡眠/觉醒分类性能(准确率=0.880)。我们提出的多模态中层融合策略可适配至多种架构与任务,为婴儿行为研究拓展新方向。