RepAugment: Input-Agnostic Representation-Level Augmentation for Respiratory Sound Classification

Recent advancements in AI have democratized its deployment as a healthcare assistant. While pretrained models from large-scale visual and audio datasets have demonstrably generalized to this task, surprisingly, no studies have explored pretrained speech models, which, as human-originated sounds, intuitively would share closer resemblance to lung sounds. This paper explores the efficacy of pretrained speech models for respiratory sound classification. We find that there is a characterization gap between speech and lung sound samples, and to bridge this gap, data augmentation is essential. However, the most widely used augmentation technique for audio and speech, SpecAugment, requires 2-dimensional spectrogram format and cannot be applied to models pretrained on speech waveforms. To address this, we propose RepAugment, an input-agnostic representation-level augmentation technique that outperforms SpecAugment, but is also suitable for respiratory sound classification with waveform pretrained models. Experimental results show that our approach outperforms the SpecAugment, demonstrating a substantial improvement in the accuracy of minority disease classes, reaching up to 7.14%.

翻译：近年来，人工智能的进步已使其作为医疗辅助工具得到广泛应用。尽管基于大规模视觉和音频数据集的预训练模型已证明可泛化至该任务，但令人惊讶的是，尚无研究探索预训练语音模型——作为人类发声的音频，这类模型直觉上应与肺部声音具有更高相似性。本文探究了预训练语音模型在呼吸音分类中的有效性。我们发现语音与肺音样本之间存在表征差异，而数据增强是弥合这一差异的关键。然而，音频和语音领域最广泛使用的增强技术SpecAugment需依赖二维语谱图输入格式，无法应用于基于语音波形预训练的模型。为此，我们提出RepAugment——一种输入无关的表示级数据增强技术，其性能不仅优于SpecAugment，还可适配基于波形的预训练模型进行呼吸音分类。实验结果表明，我们的方法全面超越SpecAugment，尤其将少数类疾病的分类准确率显著提升达7.14%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/