Health-related acoustic signals, such as cough and breathing sounds, are relevant for medical diagnosis and continuous health monitoring. Most existing machine learning approaches for health acoustics are trained and evaluated on specific tasks, limiting their generalizability across various healthcare applications. In this paper, we leverage a self-supervised learning framework, SimCLR with a Slowfast NFNet backbone, for contrastive learning of health acoustics. A crucial aspect of optimizing Slowfast NFNet for this application lies in identifying effective audio augmentations. We conduct an in-depth analysis of various audio augmentation strategies and demonstrate that an appropriate augmentation strategy enhances the performance of the Slowfast NFNet audio encoder across a diverse set of health acoustic tasks. Our findings reveal that when augmentations are combined, they can produce synergistic effects that exceed the benefits seen when each is applied individually.
翻译:健康相关的声学信号(如咳嗽声和呼吸声)在医学诊断和持续健康监测中具有重要意义。现有的大多数健康声学机器学习方法针对特定任务进行训练和评估,限制了其在不同医疗应用场景中的泛化能力。本文采用自监督学习框架SimCLR(基于Slowfast NFNet骨干网络)来实现健康声学信号的对比学习。优化Slowfast NFNet在此应用中的关键环节在于识别有效的音频增强策略。我们深入分析了多种音频增强方法,并证明适当的增强策略能够提升Slowfast NFNet音频编码器在多样化健康声学任务上的表现。研究结果表明,当多种增强方法组合使用时,会产生超越各自单独应用效果的协同效应。