Synthetic Data-based Detection of Zebras in Drone Imagery

Nowadays, there is a wide availability of datasets that enable the training of common object detectors or human detectors. These come in the form of labelled real-world images and require either a significant amount of human effort, with a high probability of errors such as missing labels, or very constrained scenarios, e.g. VICON systems. On the other hand, uncommon scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain information, such as human shapes, are hardly available. To overcome this, synthetic data generation with realistic rendering technologies has recently gained traction and advanced research areas such as target tracking and human pose estimation. However, subjects such as wild animals are still usually not well represented in such datasets. In this work, we first show that a pre-trained YOLO detector can not identify zebras in real images recorded from aerial viewpoints. To solve this, we present an approach for training an animal detector using only synthetic data. We start by generating a novel synthetic zebra dataset using GRADE, a state-of-the-art framework for data generation. The dataset includes RGB, depth, skeletal joint locations, pose, shape and instance segmentations for each subject. We use this to train a YOLO detector from scratch. Through extensive evaluations of our model with real-world data from i) limited datasets available on the internet and ii) a new one collected and manually labelled by us, we show that we can detect zebras by using only synthetic data during training. The code, results, trained models, and both the generated and training data are provided as open-source at https://eliabntt.github.io/grade-rr.

翻译：当前，通用物体检测器或人体检测器的训练依赖于大量标注的真实世界图像数据集。这些数据集需要大量人工标注，不仅易出现漏标等错误，且局限于受控场景（如VICON系统）。相比之下，非典型场景（如航拍视角）、野生动物（如野生斑马）或难以获取的信息（如人体轮廓）相关数据集则极度匮乏。为解决这一问题，基于真实感渲染技术的合成数据生成方法近年来在目标跟踪、人体姿态估计等领域取得显著进展。然而，野生动物等对象在现有数据集中的表征仍不充分。本文首先证实，预训练的YOLO检测器无法识别航拍视角真实图像中的斑马。为此，我们提出一种仅利用合成数据训练动物检测器的方法。首先采用先进的数据生成框架GRADE创建新型合成斑马数据集，该数据集包含每只斑马的RGB图像、深度图、骨骼关节点位置、姿态、形状及实例分割信息。继而使用该数据集从零训练YOLO检测器。通过使用两类真实世界数据（互联网上有限的现有数据集，以及我们自主采集并手工标注的新数据集）对模型进行广泛评估，我们证明仅使用合成数据训练即可实现斑马检测。相关代码、结果、预训练模型及生成/训练数据均已在https://eliabntt.github.io/grade-rr开源。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日