Despite consistent advancement in powerful deep learning techniques in recent years, large amounts of training data are still necessary for the models to avoid overfitting. Synthetic datasets using generative adversarial networks (GAN) have recently been generated to overcome this problem. Nevertheless, despite advancements, GAN-based methods are usually hard to train or fail to generate high-quality data samples. In this paper, we propose an environmental sound classification augmentation technique based on the diffusion probabilistic model with DPM-Solver$++$ for fast sampling. In addition, to ensure the quality of the generated spectrograms, we train a top-k selection discriminator on the dataset. According to the experiment results, the synthesized spectrograms have similar features to the original dataset and can significantly increase the classification accuracy of different state-of-the-art models compared with traditional data augmentation techniques. The public code is available on \url{https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation}.
翻译:尽管近年来深度学习技术持续取得显著进展,但模型仍需要大量训练数据以避免过拟合。为克服这一问题,研究者近期利用生成对抗网络(GAN)生成合成数据集。然而,尽管有所进展,基于GAN的方法通常训练困难,或难以生成高质量数据样本。本文提出一种基于扩散概率模型的环境声音分类增强技术,并采用DPM-Solver$++$实现快速采样。此外,为确保生成的声谱图质量,我们在数据集上训练了一个Top-k选择判别器。实验结果表明,与传统数据增强技术相比,合成声谱图具有与原始数据集相似的特征,并能显著提升不同最优模型的分类准确率。公开代码详见:\url{https://github.com/JNAIC/DPMs-for-Audio-Data-Augmentation}。