While recent sound event detection (SED) systems can identify baleen whale calls in marine audio, challenges related to false positive and minority-class detection persist. We propose the boundary proposal network (BPN), which extends an existing lightweight SED system. The BPN is inspired by work in image object detection and aims to reduce the number of false positive detections. It achieves this by using intermediate latent representations computed within the backbone classification model to gate the final output. When added to an existing SED system, the BPN achieves a 16.8 % absolute increase in precision, as well as 21.3 % and 9.4 % improvements in the F1-score for minority-class d-calls and bp-calls, respectively. We further consider two approaches to the selection of post-processing hyperparameters: a forward-search and a backward-search. By separately optimising event-level and frame-level hyperparameters, these two approaches lead to considerable performance improvements over parameters selected using empirical methods. The complete WhaleVAD-BPN system achieves a cross-validated development F1-score of 0.475, which is a 9.8 % absolute improvement over the baseline.
翻译:尽管当前的声音事件检测(SED)系统能够识别海洋音频中的须鲸叫声,但在误报检测与少数类检测方面仍面临挑战。本文提出边界提议网络(BPN),该网络基于现有轻量级SED系统进行扩展。BPN受图像目标检测领域研究的启发,旨在通过利用主干分类模型内部计算的中间潜在表征对最终输出进行门控,从而减少误报检测数量。当将其集成至现有SED系统时,BPN使检测精确度绝对提升16.8%,少数类d-call与bp-call的F1分数分别提升21.3%与9.4%。我们进一步探讨了两种后处理超参数选择方法:前向搜索与后向搜索。通过分别优化事件级与帧级超参数,这两种方法相较于基于经验方法选取的参数实现了显著的性能提升。完整的WhaleVAD-BPN系统在交叉验证开发集上获得0.475的F1分数,较基线系统绝对提升9.8%。