Marine ecosystem monitoring via Passive Acoustic Monitoring (PAM) generates vast data, but deep learning often requires precise annotations and short segments. We introduce DSMIL-LocNet, a Multiple Instance Learning framework for whale call detection and localization using only bag-level labels. Our dual-stream model processes 2-30 minute audio segments, leveraging spectral and temporal features with attention-based instance selection. Tests on Antarctic whale data show longer contexts improve classification (F1: 0.8-0.9) while medium instances ensure localization precision (0.65-0.70). This suggests MIL can enhance scalable marine monitoring. Code: https://github.com/Ragib-Amin-Nihal/DSMIL-Loc
翻译:通过被动声学监测(PAM)进行海洋生态系统监测会产生海量数据,但深度学习方法通常需要精确标注和短时音频片段。我们提出了DSMIL-LocNet,一种仅使用包级标签进行鲸类叫声检测与定位的多示例学习框架。该双流模型处理2-30分钟的音频片段,通过基于注意力的示例选择机制同时利用频谱与时间特征。在南极鲸类数据上的测试表明,较长上下文能提升分类性能(F1分数:0.8-0.9),而中等时长的示例能保证定位精度(0.65-0.70)。这证明多示例学习可增强海洋监测的可扩展性。代码:https://github.com/Ragib-Amin-Nihal/DSMIL-Loc