Acoustic scene classification (ASC) models on edge devices typically operate under fixed class assumptions, lacking the transferability needed for real-world applications that require adaptation to new or refined acoustic categories. We propose ContrastASC, which learns generalizable acoustic scene representations by structuring the embedding space to preserve semantic relationships between scenes, enabling adaptation to unseen categories without retraining. Our approach combines supervised contrastive fine-tuning of pre-trained models with contrastive representation distillation to transfer this structured knowledge to compact student models. Our evaluation shows that ContrastASC demonstrates improved few-shot adaptation to unseen categories while maintaining strong closed-set performance.
翻译:边缘设备上的声学场景分类模型通常在固定类别假设下运行,缺乏实际应用所需的迁移能力,这些应用需要适应新的或细化的声学类别。我们提出了ContrastASC,该方法通过构建嵌入空间以保持场景间的语义关系来学习可泛化的声学场景表示,从而无需重新训练即可适应未见类别。我们的方法结合了预训练模型的监督对比微调与对比表示蒸馏,将这种结构化知识迁移至紧凑的学生模型。评估结果表明,ContrastASC在保持强大闭集性能的同时,对未见类别展现出改进的少样本适应能力。