Acoustic scene classification (ASC) is highly important in the real world. Recently, deep learning-based methods have been widely employed for acoustic scene classification. However, these methods are currently not lightweight enough as well as their performance is not satisfactory. To solve these problems, we propose a deep space separable distillation network. Firstly, the network performs high-low frequency decomposition on the log-mel spectrogram, significantly reducing computational complexity while maintaining model performance. Secondly, we specially design three lightweight operators for ASC, including Separable Convolution (SC), Orthonormal Separable Convolution (OSC), and Separable Partial Convolution (SPC). These operators exhibit highly efficient feature extraction capabilities in acoustic scene classification tasks. The experimental results demonstrate that the proposed method achieves a performance gain of 9.8% compared to the currently popular deep learning methods, while also having smaller parameter count and computational complexity.
翻译:声学场景分类(ASC)在现实世界中具有重要意义。近年来,基于深度学习的方法被广泛用于声学场景分类。然而,这些方法目前还不够轻量,且性能不尽如人意。为解决这些问题,我们提出一种深度空间可分离蒸馏网络。首先,该网络对对数梅尔频谱图进行高低频分解,在保持模型性能的同时显著降低计算复杂度。其次,我们专门为ASC设计了三种轻量级算子,包括可分离卷积(SC)、正交归一可分离卷积(OSC)和可分离部分卷积(SPC)。这些算子在声学场景分类任务中展现出高效的特征提取能力。实验结果表明,与当前流行的深度学习方法相比,所提方法在性能上提升9.8%,同时参数数量和计算复杂度更低。