Improving Underwater Visual Tracking With a Large Scale Dataset and Image Enhancement

This paper presents a new dataset and general tracker enhancement method for Underwater Visual Object Tracking (UVOT). Despite its significance, underwater tracking has remained unexplored due to data inaccessibility. It poses distinct challenges; the underwater environment exhibits non-uniform lighting conditions, low visibility, lack of sharpness, low contrast, camouflage, and reflections from suspended particles. Performance of traditional tracking methods designed primarily for terrestrial or open-air scenarios drops in such conditions. We address the problem by proposing a novel underwater image enhancement algorithm designed specifically to boost tracking quality. The method has resulted in a significant performance improvement, of up to 5.0% AUC, of state-of-the-art (SOTA) visual trackers. To develop robust and accurate UVOT methods, large-scale datasets are required. To this end, we introduce a large-scale UVOT benchmark dataset consisting of 400 video segments and 275,000 manually annotated frames enabling underwater training and evaluation of deep trackers. The videos are labelled with several underwater-specific tracking attributes including watercolor variation, target distractors, camouflage, target relative size, and low visibility conditions. The UVOT400 dataset, tracking results, and the code are publicly available on: https://github.com/BasitAlawode/UWVOT400.

翻译：本文提出了一种新的数据集和通用追踪器增强方法，用于水下视觉目标追踪（Underwater Visual Object Tracking, UVOT）。尽管水下追踪具有重要意义，但由于数据难以获取，该领域一直未得到充分探索。水下环境存在诸多独特挑战，包括不均匀光照条件、低可见度、缺乏清晰度、低对比度、伪装效应以及悬浮颗粒的反射。主要针对陆地或露天场景设计的传统追踪方法在此类条件下性能显著下降。我们通过提出一种专门用于提升追踪质量的新型水下图像增强算法来解决该问题。该方法使最先进（State-of-the-Art, SOTA）视觉追踪器的性能显著提升，AUC指标最高提高5.0%。为开发稳健且准确的UVOT方法，需要大规模数据集。为此，我们引入了一个大规模UVOT基准数据集，包含400个视频片段和27.5万张人工标注帧，可支持深度追踪器的水下训练与评估。视频按多种水下特定追踪属性进行标注，包括水色变化、目标干扰物、伪装、目标相对尺寸及低可见度条件。UVOT400数据集、追踪结果及代码已公开于：https://github.com/BasitAlawode/UWVOT400。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日