Underwater video monitoring is a promising strategy for assessing marine biodiversity, but the vast volume of uneventful footage makes manual inspection highly impractical. In this work, we explore the use of visual anomaly detection (VAD) based on deep neural networks to automatically identify interesting or anomalous events. We introduce AURA, the first multi-annotator benchmark dataset for underwater VAD, and evaluate four VAD models across two marine scenes. We demonstrate the importance of robust frame selection strategies to extract meaningful video segments. Our comparison against multiple annotators reveals that VAD performance of current models varies dramatically and is highly sensitive to both the amount of training data and the variability in visual content that defines "normal" scenes. Our results highlight the value of soft and consensus labels and offer a practical approach for supporting scientific exploration and scalable biodiversity monitoring.
翻译:水下视频监测是评估海洋生物多样性的一种有效策略,但海量无事件视频片段使得人工检视极不现实。本研究探索基于深度神经网络的视觉异常检测方法,用于自动识别有趣或异常事件。我们提出了首个水下视觉异常检测多标注者基准数据集AURA,并在两个海洋场景中评估了四种视觉异常检测模型。我们证明了鲁棒的帧选择策略对于提取有意义视频片段的重要性。通过与多位标注者的对比,我们发现当前模型的视觉异常检测性能差异显著,且对训练数据量以及定义“正常”场景的视觉内容变化高度敏感。研究结果凸显了软标签与共识标签的价值,并为支持科学探索与可扩展的生物多样性监测提供了实用方法。