In machine learning applications, gradual data ingress is common, especially in audio processing where incremental learning is vital for real-time analytics. Few-shot class-incremental learning addresses challenges arising from limited incoming data. Existing methods often integrate additional trainable components or rely on a fixed embedding extractor post-training on base sessions to mitigate concerns related to catastrophic forgetting and the dangers of model overfitting. However, using cross-entropy loss alone during base session training is suboptimal for audio data. To address this, we propose incorporating supervised contrastive learning to refine the representation space, enhancing discriminative power and leading to better generalization since it facilitates seamless integration of incremental classes, upon arrival. Experimental results on NSynth and LibriSpeech datasets with 100 classes, as well as ESC dataset with 50 and 10 classes, demonstrate state-of-the-art performance.
翻译:在机器学习应用中,数据逐步流入是常见现象,尤其在音频处理领域,增量学习对于实时分析至关重要。少样本类增量学习旨在应对有限流入数据带来的挑战。现有方法通常集成额外的可训练组件,或在基础会话训练后依赖固定的嵌入提取器,以缓解灾难性遗忘和模型过拟合的风险。然而,在基础会话训练中仅使用交叉熵损失对于音频数据并非最优。为此,我们提出引入监督对比学习来优化表征空间,增强判别能力,从而获得更好的泛化性能,因为它促进了新增类别的无缝集成。在包含100个类别的NSynth和LibriSpeech数据集,以及包含50个和10个类别的ESC数据集上的实验结果均展示了最先进的性能。