DINO-Explorer: Active Underwater Discovery via Ego-Motion Compensated Semantic Predictive Coding

Marine ecosystem degradation necessitates continuous, scientifically selective underwater monitoring. However, most autonomous underwater vehicles (AUVs) operate as passive data loggers, capturing exhaustive video for offline review and frequently missing transient events of high scientific value. Transitioning to active perception requires a causal, online signal that highlights significant phenomena while suppressing maneuver-induced visual changes. We propose DINO-Explorer, a novelty-aware perception framework driven by a continuous semantic surprise signal. Operating within the latent space of a frozen DINOv3 foundation model, it leverages a lightweight, action-conditioned recurrent predictor to anticipate short-horizon semantic evolution. An efference-copy-inspired module utilizes globally pooled optical flow to discount self-induced visual changes without suppressing genuine environmental novelty. We evaluate this signal on the downstream task of asynchronous event triage under variant telemetry constraints. Results demonstrate that DINO-Explorer provides a robust, bandwidth-efficient attention mechanism. At a fixed operating point, the system retains 78.8% of post-discovery human-reviewer consensus events with a 56.8% trigger confirmation rate, effectively surfacing mission-relevant phenomena. Crucially, ego-motion conditioning suppresses 45.5% of false positives relative to an uncompensated surprise signal baseline. In a replay-side Pareto ablation study, DINO-Explorer robustly dominates the validated peak F1 versus telemetry bandwidth frontier, reducing telemetry bandwidth by 48.2% at the selected operating point while maintaining a 62.2% peak F1 score, successfully concentrating data transmission around human-verified novelty events.

翻译：海洋生态系统退化亟需持续且具备科学选择性的水下监测。然而，大多数自主水下航行器（AUV）仅作为被动数据记录器运行，捕获全量视频以供离线审查，经常错失具有高科学价值的瞬态事件。向主动感知的转变需要一种因果、在线的信号，该信号能够凸显重要现象，同时抑制机动引起的视觉变化。我们提出DINO-Explorer，一种由连续语义惊奇信号驱动的新奇感知框架。该框架在冻结的DINOv3基础模型的潜空间中运行，利用轻量级的动作条件循环预测器来预测短期语义演化。一个基于传出拷贝原理的模块利用全局池化的光流来消除自驱动视觉变化的影响，同时不压制真实的环境新奇性。我们在异步事件分类的下游任务上，于不同遥测约束条件下评估了该信号。结果表明，DINO-Explorer提供了一种鲁棒、带宽高效的注意力机制。在固定工作点，系统保留了78.8%经事后人工审查达成共识的事件，并达到56.8%的触发确认率，有效突出了与任务相关的现象。关键在于，与无补偿的惊奇信号基线相比，自运动条件化抑制了45.5%的误报。在回放侧的帕累托消融研究中，DINO-Explorer在峰值F1分数与遥测带宽的边界上展现出鲁棒主导性，在选定工作点减少48.2%遥测带宽的同时保持62.2%的峰值F1分数，成功将数据传输集中于人工验证的新奇事件。