3D Annotation Of Arbitrary Objects In The Wild

from arxiv, 6 pages, 4 figures. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Recent years have produced a variety of learning based methods in the context of computer vision and robotics. Most of the recently proposed methods are based on deep learning, which require very large amounts of data compared to traditional methods. The performance of the deep learning methods are largely dependent on the data distribution they were trained on, and it is important to use data from the robot's actual operating domain during training. Therefore, it is not possible to rely on pre-built, generic datasets when deploying robots in real environments, creating a need for efficient data collection and annotation in the specific operating conditions the robots will operate in. The challenge is then: how do we reduce the cost of obtaining such datasets to a point where we can easily deploy our robots in new conditions, environments and to support new sensors? As an answer to this question, we propose a data annotation pipeline based on SLAM, 3D reconstruction, and 3D-to-2D geometry. The pipeline allows creating 3D and 2D bounding boxes, along with per-pixel annotations of arbitrary objects without needing accurate 3D models of the objects prior to data collection and annotation. Our results showcase almost 90% Intersection-over-Union (IoU) agreement on both semantic segmentation and 2D bounding box detection across a variety of objects and scenes, while speeding up the annotation process by several orders of magnitude compared to traditional manual annotation.

翻译：近些年来,在计算机视觉和机器人方面产生了各种基于学习的方法。最近提出的方法大多基于深层次的学习,这要求与传统方法相比需要大量数据。深层次学习方法的性能在很大程度上取决于它们所培训的数据分布,在培训期间使用机器人实际操作域的数据十分重要。因此,在实际环境中部署机器人时,不可能依赖预先建造的通用数据集,这就需要在机器人将操作的具体操作条件中高效收集数据和注释。那么,挑战在于:我们如何降低获取这种数据集的成本,使之达到我们可以在新条件、环境和支持新传感器中轻松部署机器人的地步?作为对这一问题的回答,我们建议根据SLM、3D重建、3D至2D的地理测量方法,建立数据注解管道。管道允许创建 3D 和 2D 绑定框,同时对任意物体进行每像素说明,而不需要精确的三维模型,在数据收集和超轨中进行精确的三维天体标,同时对二维线路段进行解析。我们展示了在数据收集和超线路段过程中的90%的结果。