This work investigates pretrained audio representations for few shot Sound Event Detection. We specifically address the task of few shot detection of novel acoustic sequences, or sound events with semantically meaningful temporal structure, without assuming access to non-target audio. We develop procedures for pretraining suitable representations, and methods which transfer them to our few shot learning scenario. Our experiments evaluate the general purpose utility of our pretrained representations on AudioSet, and the utility of proposed few shot methods via tasks constructed from real-world acoustic sequences. Our pretrained embeddings are suitable to the proposed task, and enable multiple aspects of our few shot framework.
翻译:本研究探究了预训练音频表征在少样本声音事件检测中的应用。我们专门针对新颖声学序列(即具有语义意义的时间结构的声音事件)的少样本检测任务展开研究,且无需假设可获取非目标音频。我们开发了适用于预训练表征的训练流程,以及将这些表征迁移至少样本学习场景的方法。通过基于AudioSet数据集评估预训练表征的通用性能,并利用真实世界声学序列构建的任务验证所提少样本方法的有效性,实验结果表明:我们的预训练嵌入表征适用于所提出的任务,并能支撑少样本框架的多项功能。