Respiratory sound contains crucial information for the early diagnosis of fatal lung diseases. Since the COVID-19 pandemic, there has been a growing interest in contact-free medical care based on electronic stethoscopes. To this end, cutting-edge deep learning models have been developed to diagnose lung diseases; however, it is still challenging due to the scarcity of medical data. In this study, we demonstrate that the pretrained model on large-scale visual and audio datasets can be generalized to the respiratory sound classification task. In addition, we introduce a straightforward Patch-Mix augmentation, which randomly mixes patches between different samples, with Audio Spectrogram Transformer (AST). We further propose a novel and effective Patch-Mix Contrastive Learning to distinguish the mixed representations in the latent space. Our method achieves state-of-the-art performance on the ICBHI dataset, outperforming the prior leading score by an improvement of 4.08%.
翻译:呼吸音含有对致命性肺部疾病早期诊断至关重要的信息。自新冠疫情以来,基于电子听诊器的非接触式医疗引起了越来越多的关注。为此,前沿的深度学习模型已被开发用于诊断肺部疾病;然而,由于医疗数据的稀缺性,这一任务仍面临挑战。在本研究中,我们证明在大规模视觉和音频数据集上预训练的模型可以泛化至呼吸音分类任务。此外,我们引入了一种简单的补丁混合增强方法,该方法在音频频谱变换器中随机混合不同样本之间的补丁。我们进一步提出了一种新颖且有效的补丁混合对比学习方法,以在潜在空间中区分混合表征。我们的方法在ICBHI数据集上实现了最先进的性能,相比先前领先的分数提升了4.08%。