Audio classification aims at recognizing audio signals, including speech commands or sound events. However, current audio classifiers are susceptible to perturbations and adversarial attacks. In addition, real-world audio classification tasks often suffer from limited labeled data. To help bridge these gaps, previous work developed neuro-inspired convolutional neural networks (CNNs) with sparse coding via the Locally Competitive Algorithm (LCA) in the first layer (i.e., LCANets) for computer vision. LCANets learn in a combination of supervised and unsupervised learning, reducing dependency on labeled samples. Motivated by the fact that auditory cortex is also sparse, we extend LCANets to audio recognition tasks and introduce LCANets++, which are CNNs that perform sparse coding in multiple layers via LCA. We demonstrate that LCANets++ are more robust than standard CNNs and LCANets against perturbations, e.g., background noise, as well as black-box and white-box attacks, e.g., evasion and fast gradient sign (FGSM) attacks.
翻译:音频分类旨在识别音频信号,包括语音指令或声音事件。然而,当前的音频分类器容易受到扰动和对抗攻击的影响。此外,现实中的音频分类任务常面临标注数据有限的问题。为弥补这些不足,先前研究针对计算机视觉领域,开发了受神经科学启发的卷积神经网络(CNN),其第一层通过局部竞争算法(LCA)实现稀疏编码(即LCANets)。LCANets结合监督学习和无监督学习进行训练,降低了对标注样本的依赖。受听觉皮层同样具有稀疏性的启发,我们将LCANets扩展到音频识别任务,并提出LCANets++——一种通过LCA在多层中实现稀疏编码的CNN。实验证明,LCANets++相比标准CNN和LCANets,对背景噪声等扰动以及黑盒与白盒攻击(如规避攻击和快速梯度符号法(FGSM)攻击)具有更强的鲁棒性。