Deep learning requires large amounts of data, and a well-defined pipeline for labeling and augmentation. Current solutions support numerous computer vision tasks with dedicated annotation types and formats, such as bounding boxes, polygons, and key points. These annotations can be combined into a single data format to benefit approaches such as multi-task models. However, to our knowledge, no available labeling tool supports the export functionality for a combined benchmark format, and no augmentation library supports transformations for the combination of all. In this work, these functionalities are presented, with visual data annotation and augmentation to train a multi-task model (object detection, segmentation, and key point extraction). The tools are demonstrated in two robot perception use cases.
翻译:深度学习需要大量数据,以及一套完善的标注与增强流程。现有解决方案支持多种计算机视觉任务,并具有专用的标注类型和格式,如边界框、多边形和关键点。这些标注可以整合为统一的数据格式,从而有益于多任务模型等方法。然而,据我们所知,目前尚无标注工具支持组合基准格式的导出功能,也没有增强库能同时处理所有标注类型的变换。本研究提出了这些功能,包括视觉数据标注与增强,以训练多任务模型(目标检测、分割和关键点提取)。这些工具通过两个机器人感知用例进行了演示。