This technical report presents our frame-level embedding learning system for the DCASE2024 challenge for few-shot bioacoustic event detection (Task 5).In this work, we used log-mel and PCEN for feature extraction of the input audio, Netmamba Encoder as the information interaction network, and adopted data augmentation strategies to improve the generalizability of the trained model as well as multiple post-processing methods. Our final system achieved an F-measure score of 56.4%, securing the 2nd rank in the few-shot bioacoustic event detection category of the Detection and Classification of Acoustic Scenes and Events Challenge 2024.
翻译:本技术报告介绍了我们为DCASE2024挑战赛中少样本生物声学事件检测任务(任务5)所开发的帧级嵌入学习系统。在本工作中,我们采用对数梅尔谱和PCEN对输入音频进行特征提取,使用Netmamba Encoder作为信息交互网络,并采用数据增强策略以提升训练模型的泛化能力,同时结合多种后处理方法。我们的最终系统取得了56.4%的F值分数,在2024年声学场景与事件检测及分类挑战赛的少样本生物声学事件检测类别中位列第二名。