Nowadays, there is a wide availability of datasets that enable the training of common object detectors or human detectors. These come in the form of labelled real-world images and require either a significant amount of human effort, with a high probability of errors such as missing labels, or very constrained scenarios, e.g. VICON systems. On the other hand, uncommon scenarios, like aerial views, animals, like wild zebras, or difficult-to-obtain information, such as human shapes, are hardly available. To overcome this, synthetic data generation with realistic rendering technologies has recently gained traction and advanced research areas such as target tracking and human pose estimation. However, subjects such as wild animals are still usually not well represented in such datasets. In this work, we first show that a pre-trained YOLO detector can not identify zebras in real images recorded from aerial viewpoints. To solve this, we present an approach for training an animal detector using only synthetic data. We start by generating a novel synthetic zebra dataset using GRADE, a state-of-the-art framework for data generation. The dataset includes RGB, depth, skeletal joint locations, pose, shape and instance segmentations for each subject. We use this to train a YOLO detector from scratch. Through extensive evaluations of our model with real-world data from i) limited datasets available on the internet and ii) a new one collected and manually labelled by us, we show that we can detect zebras by using only synthetic data during training. The code, results, trained models, and both the generated and training data are provided as open-source at https://eliabntt.github.io/grade-rr.
翻译:当前,通用物体检测器或人体检测器的训练依赖于大量标注的真实世界图像数据集。这些数据集需要大量人工标注,不仅易出现漏标等错误,且局限于受控场景(如VICON系统)。相比之下,非典型场景(如航拍视角)、野生动物(如野生斑马)或难以获取的信息(如人体轮廓)相关数据集则极度匮乏。为解决这一问题,基于真实感渲染技术的合成数据生成方法近年来在目标跟踪、人体姿态估计等领域取得显著进展。然而,野生动物等对象在现有数据集中的表征仍不充分。本文首先证实,预训练的YOLO检测器无法识别航拍视角真实图像中的斑马。为此,我们提出一种仅利用合成数据训练动物检测器的方法。首先采用先进的数据生成框架GRADE创建新型合成斑马数据集,该数据集包含每只斑马的RGB图像、深度图、骨骼关节点位置、姿态、形状及实例分割信息。继而使用该数据集从零训练YOLO检测器。通过使用两类真实世界数据(互联网上有限的现有数据集,以及我们自主采集并手工标注的新数据集)对模型进行广泛评估,我们证明仅使用合成数据训练即可实现斑马检测。相关代码、结果、预训练模型及生成/训练数据均已在https://eliabntt.github.io/grade-rr开源。