Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification augmentation technique based on the diffusion probabilistic model with DPM-Solver$++$ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we train a top-k selection discriminator on the dataset. According to the experiment results, the synthesized spectrograms have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation.
翻译:尽管近年来深度学习技术持续进步,但模型仍需大量训练数据以避免过拟合。为克服这一问题,研究者已利用生成对抗网络生成合成数据集。然而,尽管取得了进展,基于生成对抗网络的方法通常难以训练,或无法生成高质量的数据样本。本文提出一种基于扩散概率模型的环境声音分类增强技术,该技术采用DPM-Solver$++$实现快速采样。此外,为确保生成声谱图的质量,我们在数据集上训练了一个top-k选择判别器。实验结果表明,与传统数据增强技术相比,合成声谱图具有与原始数据集相似的特征,并能显著提升多种最先进模型的分类准确率。公开代码见https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation。