Deep generative models have emerged as a promising approach in the medical image domain to address data scarcity. However, their use for sequential data like respiratory sounds is less explored. In this work, we propose a straightforward approach to augment imbalanced respiratory sound data using an audio diffusion model as a conditional neural vocoder. We also demonstrate a simple yet effective adversarial fine-tuning method to align features between the synthetic and real respiratory sound samples to improve respiratory sound classification performance. Our experimental results on the ICBHI dataset demonstrate that the proposed adversarial fine-tuning is effective, while only using the conventional augmentation method shows performance degradation. Moreover, our method outperforms the baseline by 2.24% on the ICBHI Score and improves the accuracy of the minority classes up to 26.58%. For the supplementary material, we provide the code at https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound.
翻译:深度生成模型已作为解决医疗图像领域数据稀缺问题的一种有前景的方法出现。然而,它们在呼吸音等序列数据中的应用探索较少。在这项工作中,我们提出了一种简单的方法,使用音频扩散模型作为条件神经声码器来增强不平衡的呼吸音数据。我们还展示了一种简单而有效的对抗微调方法,用于对齐合成呼吸音样本与真实呼吸音样本之间的特征,从而提高呼吸音分类性能。我们在ICBHI数据集上的实验结果表明,所提出的对抗微调方法有效,而仅使用常规增强方法会导致性能下降。此外,我们的方法在ICBHI评分上比基线高出2.24%,并将少数类别的准确率提升至多26.58%。补充材料中,我们提供了代码,网址为https://github.com/kaen2891/adversarial_fine-tuning_using_generated_respiratory_sound。