Deep neural networks (DNNs) have achieved significant success in numerous applications. The remarkable performance of DNNs is largely attributed to the availability of massive, high-quality training datasets. However, processing such massive training data requires huge computational and storage resources. Dataset distillation is a promising solution to this problem, offering the capability to compress a large dataset into a smaller distilled dataset. The model trained on the distilled dataset can achieve comparable performance to the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have explored dataset distillation for audio data. In this work, for the first time, we propose a Dataset Distillation Framework for Audio Data (DDFAD). Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as extracted features for audio data. After that, the FD-MFCC is distilled through the matching training trajectory distillation method. Finally, we propose an audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments demonstrate the effectiveness of DDFAD on various audio datasets. In addition, we show that DDFAD has promising application prospects in many applications, such as continual learning and neural architecture search.
翻译:深度神经网络(DNNs)在众多应用中取得了显著成功。DNNs的卓越性能很大程度上归功于大规模高质量训练数据集的可用性。然而,处理如此海量的训练数据需要巨大的计算和存储资源。数据集蒸馏是解决该问题的一种有前景的方案,它能够将大型数据集压缩成一个更小的蒸馏数据集。在蒸馏数据集上训练的模型可以达到与在整个数据集上训练的模型相当的性能。尽管数据集蒸馏已在图像数据上得到验证,但目前尚未有研究探索音频数据的数据集蒸馏。本工作中,我们首次提出了面向音频数据的数据集蒸馏框架(DDFAD)。具体而言,我们首先提出融合差分梅尔频率倒谱系数(FD-MFCC)作为音频数据的提取特征。随后,通过匹配训练轨迹蒸馏方法对FD-MFCC进行蒸馏。最后,我们提出一种基于Griffin-Lim算法的音频信号重建算法,以从蒸馏后的FD-MFCC重建音频信号。大量实验证明了DDFAD在各种音频数据集上的有效性。此外,我们展示了DDFAD在持续学习、神经架构搜索等许多应用中具有广阔的应用前景。