XS-VID: An Extremely Small Video Object Detection Dataset

Small Video Object Detection (SVOD) is a crucial subfield in modern computer vision, essential for early object discovery and detection. However, existing SVOD datasets are scarce and suffer from issues such as insufficiently small objects, limited object categories, and lack of scene diversity, leading to unitary application scenarios for corresponding methods. To address this gap, we develop the XS-VID dataset, which comprises aerial data from various periods and scenes, and annotates eight major object categories. To further evaluate existing methods for detecting extremely small objects, XS-VID extensively collects three types of objects with smaller pixel areas: extremely small (\textit{es}, $0\sim12^2$), relatively small (\textit{rs}, $12^2\sim20^2$), and generally small (\textit{gs}, $20^2\sim32^2$). XS-VID offers unprecedented breadth and depth in covering and quantifying minuscule objects, significantly enriching the scene and object diversity in the dataset. Extensive validations on XS-VID and the publicly available VisDrone2019VID dataset show that existing methods struggle with small object detection and significantly underperform compared to general object detectors. Leveraging the strengths of previous methods and addressing their weaknesses, we propose YOLOFT, which enhances local feature associations and integrates temporal motion features, significantly improving the accuracy and stability of SVOD. Our datasets and benchmarks are available at \url{https://gjhhust.github.io/XS-VID/}.

翻译：小型视频目标检测（SVOD）是现代计算机视觉中的一个关键子领域，对于早期目标发现与检测至关重要。然而，现有SVOD数据集稀缺，且存在目标尺寸不够小、目标类别有限、场景多样性不足等问题，导致相应方法的应用场景单一。为填补这一空白，我们开发了XS-VID数据集，该数据集包含来自不同时期和场景的航拍数据，并标注了八大主要目标类别。为进一步评估现有方法在检测极小型目标上的性能，XS-VID广泛收集了像素面积更小的三类目标：极小型（\textit{es}, $0\sim12^2$）、相对小型（\textit{rs}, $12^2\sim20^2$）和一般小型（\textit{gs}, $20^2\sim32^2$）。XS-VID在覆盖和量化微小目标方面提供了前所未有的广度和深度，显著丰富了数据集中的场景与目标多样性。在XS-VID及公开数据集VisDrone2019VID上进行的大量验证表明，现有方法在小目标检测上存在困难，其性能显著低于通用目标检测器。结合先前方法的优势并针对其不足，我们提出了YOLOFT，该方法增强了局部特征关联并整合了时序运动特征，显著提升了SVOD的准确性与稳定性。我们的数据集与基准测试已公开于 \url{https://gjhhust.github.io/XS-VID/}。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日