Recently, there has been an upsurge in the research on maritime vision, where a lot of works are influenced by the application of computer vision for Unmanned Surface Vehicles (USVs). Various sensor modalities such as camera, radar, and lidar have been used to perform tasks such as object detection, segmentation, object tracking, and motion planning. A large subset of this research is focused on the video analysis, since most of the current vessel fleets contain the camera's onboard for various surveillance tasks. Due to the vast abundance of the video data, video scene change detection is an initial and crucial stage for scene understanding of USVs. This paper outlines our approach to detect dynamic scene changes in USVs. To the best of our understanding, this work represents the first investigation of scene change detection in the maritime vision application. Our objective is to identify significant changes in the dynamic scenes of maritime video data, particularly those scenes that exhibit a high degree of resemblance. In our system for dynamic scene change detection, we propose completely unsupervised learning method. In contrast to earlier studies, we utilize a modified cutting-edge generative picture model called VQ-VAE-2 to train on multiple marine datasets, aiming to enhance the feature extraction. Next, we introduce our innovative similarity scoring technique for directly calculating the level of similarity in a sequence of consecutive frames by utilizing grid calculation on retrieved features. The experiments were conducted using a nautical video dataset called RoboWhaler to showcase the efficient performance of our technique.
翻译:近年来,海上视觉研究热潮兴起,其中大量工作受计算机视觉在无人水面艇(USV)中应用的影响。研究人员采用摄像头、雷达、激光雷达等多种传感器模态执行目标检测、分割、目标跟踪和运动规划等任务。由于当前大多数舰船编队已配备机载摄像头以实现各种监控任务,这类研究中的很大一部分聚焦于视频分析。基于海量视频数据的特性,视频场景变化检测是USV场景理解中初始且关键的一步。本文提出了一种检测USV动态场景变化的方法。据我们所知,这是首个针对海上视觉应用中场景变化检测的研究。我们的目标是识别海上视频数据动态场景中的显著变化,尤其是那些高度相似的场景。在该动态场景变化检测系统中,我们提出了完全无监督的学习方法。与先前研究不同,我们采用改进的前沿生成图像模型VQ-VAE-2在多个海洋数据集上进行训练,旨在增强特征提取能力。随后,我们引入创新的相似度评分技术,通过对提取特征进行网格化计算,直接评估连续帧序列的相似程度。实验采用名为RoboWhaler的航海视频数据集进行,以展示我们技术的高效性能。