New classes of sounds constantly emerge with a few samples, making it challenging for models to adapt to dynamic acoustic environments. This challenge motivates us to address the new problem of few-shot class-incremental audio classification. This study aims to enable a model to continuously recognize new classes of sounds with a few training samples of new classes while remembering the learned ones. To this end, we propose a method to generate discriminative prototypes and use them to expand the model's classifier for recognizing sounds of new and learned classes. The model is first trained with a random episodic training strategy, and then its backbone is used to generate the prototypes. A dynamic relation projection module refines the prototypes to enhance their discriminability. Results on two datasets (derived from the corpora of Nsynth and FSD-MIX-CLIPS) show that the proposed method exceeds three state-of-the-art methods in average accuracy and performance dropping rate.
翻译:新类别的声音不断涌现且样本稀少,这使得模型难以适应动态声学环境。这一挑战促使我们解决小样本类增量音频分类这一新问题。本研究旨在使模型能够通过少量新类别训练样本持续识别新声音类别,同时保持对已学类别的记忆。为此,我们提出了一种生成判别性原型的方法,并利用这些原型扩展模型分类器以识别新类别和已学类别的声音。模型首先通过随机情节训练策略进行训练,随后利用其骨干网络生成原型。动态关系投影模块对原型进行细化以增强其判别性。在两个数据集(源自Nsynth和FSD-MIX-CLIPS语料库)上的实验结果表明,所提方法在平均准确率和性能下降率方面均优于三种最先进方法。