Object detection is one of the most important and fundamental aspects of computer vision tasks, which has been broadly utilized in pose estimation, object tracking and instance segmentation models. To obtain training data for object detection model efficiently, many datasets opt to obtain their unannotated data in video format and the annotator needs to draw a bounding box around each object in the images. Annotating every frame from a video is costly and inefficient since many frames contain very similar information for the model to learn from. How to select the most informative frames from a video to annotate has become a highly practical task to solve but attracted little attention in research. In this paper, we proposed a novel active learning algorithm for object detection models to tackle this problem. In the proposed active learning algorithm, both classification and localization informativeness of unlabelled data are measured and aggregated. Utilizing the temporal information from video frames, two novel localization informativeness measurements are proposed. Furthermore, a weight curve is proposed to avoid querying adjacent frames. Proposed active learning algorithm with multiple configurations was evaluated on the MuPoTS dataset and FootballPD dataset.
翻译:目标检测是计算机视觉任务中最重要且基础的方面之一,已被广泛应用于姿态估计、目标跟踪和实例分割模型。为了高效获取目标检测模型的训练数据,许多数据集选择以视频格式获取未标注数据,标注者需在图像中为每个对象绘制边界框。由于视频中的大量帧包含模型可学习的极其相似信息,逐帧标注成本高昂且效率低下。如何从视频中挑选最具信息量的帧进行标注,已成为极具实用价值却鲜受研究关注的课题。本文针对该问题,提出了一种面向目标检测模型的新型主动学习算法。在所提出的主动学习算法中,同时衡量并聚合未标注数据的分类与定位信息量。利用视频帧的时间信息,提出了两种新颖的定位信息量度量方法。此外,引入权重曲线以避免查询相邻帧。在MuPoTS数据集和FootballPD数据集上评估了所提出的主动学习算法的多种配置。