In recent years, powerful data-driven deep-learning techniques have been developed and applied for automated catch registration. However, these methods are dependent on the labelled data, which is time-consuming, labour-intensive, expensive to collect and need expert knowledge. In this study, we present an active learning technique, named BoxAL, which includes estimation of epistemic certainty of the Faster R-CNN object-detection model. The method allows selecting the most uncertain training images from an unlabeled pool, which are then used to train the object-detection model. To evaluate the method, we used an open-source image dataset obtained with a dedicated image-acquisition system developed for commercial trawlers targeting demersal species. We demonstrated, that our approach allows reaching the same object-detection performance as with the random sampling using 400 fewer labelled images. Besides, mean AP score was significantly higher at the last training iteration with 1100 training images, specifically, 39.0±1.6 and 34.8±1.8 for certainty-based sampling and random sampling, respectively. Additionally, we showed that epistemic certainty is a suitable method to sample images that the current iteration of the model cannot deal with yet. Our study additionally showed that the sampled new data is more valuable for training than the remaining unlabeled data. Our software is available on https://github.com/pieterblok/boxal.
翻译:近年来,已开发出强大的数据驱动深度学习技术并应用于自动化渔获登记。然而,这些方法依赖于标注数据,而数据标注过程耗时费力、成本高昂且需要专业知识。本研究提出一种名为BoxAL的主动学习技术,该技术包含对Faster R-CNN目标检测模型认知不确定性的估计。该方法能够从无标注图像池中选择最不确定的训练图像,进而用于训练目标检测模型。为评估该方法,我们使用了通过专为底栖物种商业拖网渔船开发的图像采集系统获取的开源图像数据集。实验证明,我们的方法仅需减少400张标注图像即可达到与随机采样相同的目标检测性能。此外,在最终训练迭代中使用1100张训练图像时,基于不确定性采样的平均AP分数显著更高,具体数值分别为39.0±1.6(不确定性采样)与34.8±1.8(随机采样)。我们还证明了认知不确定性是一种有效的采样方法,能够选择当前模型迭代尚无法处理的图像。研究进一步表明,采样的新数据比剩余无标注数据具有更高的训练价值。相关软件已发布于https://github.com/pieterblok/boxal。