While recent sound event detection (SED) systems can identify baleen whale calls in marine audio, challenges related to false positive and minority-class detection persist. We propose the boundary proposal network (BPN), which extends an existing lightweight SED system. The BPN is inspired by work in image object detection and aims to reduce the number of false positive detections. It achieves this by using intermediate latent representations computed within the backbone classification model to gate the final output. When added to an existing SED system, the BPN achieves a 16.8 % absolute increase in precision, as well as 21.3 % and 9.4 % improvements in the F1-score for minority-class d-calls and bp-calls, respectively. We further consider two approaches to the selection of post-processing hyperparameters: a forward-search and a backward-search. By separately optimising event-level and frame-level hyperparameters, these two approaches lead to considerable performance improvements over parameters selected using empirical methods. The complete WhaleVAD-BPN system achieves a cross-validated development F1-score of 0.475, which is a 9.8 % absolute improvement over the baseline.
翻译:尽管当前的声音事件检测系统能够识别海洋音频中的须鲸叫声,但在误报检测和少数类检测方面仍存在挑战。本研究提出边界提议网络,该网络对现有轻量级声音事件检测系统进行了扩展。BPN的设计灵感来源于图像目标检测领域的研究成果,旨在通过利用主干分类模型内部计算的中间潜在表征来门控最终输出,从而减少误报检测数量。当将BPN集成到现有SED系统后,其检测精度实现了16.8%的绝对提升,同时对少数类d-call和bp-call的F1分数分别提高了21.3%和9.4%。我们进一步探讨了两种后处理超参数选择方法:前向搜索与后向搜索。通过分别优化事件级和帧级超参数,这两种方法相较于基于经验选择的参数均带来了显著的性能提升。完整的WhaleVAD-BPN系统在交叉验证开发集上取得了0.475的F1分数,较基线系统实现了9.8%的绝对性能提升。