In this paper, we introduce the FOCAL (Ford-OLIVES Collaboration on Active Learning) dataset which enables the study of the impact of annotation-cost within a video active learning setting. Annotation-cost refers to the time it takes an annotator to label and quality-assure a given video sequence. A practical motivation for active learning research is to minimize annotation-cost by selectively labeling informative samples that will maximize performance within a given budget constraint. However, previous work in video active learning lacks real-time annotation labels for accurately assessing cost minimization and instead operates under the assumption that annotation-cost scales linearly with the amount of data to annotate. This assumption does not take into account a variety of real-world confounding factors that contribute to a nonlinear cost such as the effect of an assistive labeling tool and the variety of interactions within a scene such as occluded objects, weather, and motion of objects. FOCAL addresses this discrepancy by providing real annotation-cost labels for 126 video sequences across 69 unique city scenes with a variety of weather, lighting, and seasonal conditions. We also introduce a set of conformal active learning algorithms that take advantage of the sequential structure of video data in order to achieve a better trade-off between annotation-cost and performance while also reducing floating point operations (FLOPS) overhead by at least 77.67%. We show how these approaches better reflect how annotations on videos are done in practice through a sequence selection framework. We further demonstrate the advantage of these approaches by introducing two performance-cost metrics and show that the best conformal active learning method is cheaper than the best traditional active learning method by 113 hours.
翻译:本文介绍了FOCAL(福特-OLIVE 主动学习协作)数据集,该数据集能够研究视频主动学习场景中标注成本的影响。标注成本是指标注员对给定视频序列进行标注和质量保证所需的时间。主动学习研究的实际动机是通过选择性标注能带来最大性能提升的信息样本,在给定预算约束下最小化标注成本。然而,以往的视频主动学习研究缺乏用于精确评估成本最小化的实时标注标签,而是假设标注成本与待标注数据量呈线性关系。该假设未考虑多种现实混淆因素导致的非线性成本,例如辅助标注工具的影响,以及场景中遮挡物体、天气、物体运动等多种交互因素。FOCAL 通过提供126个视频序列(涵盖69个独特城市场景,包含多种天气、光照和季节条件)的真实标注成本标签,解决了这一差异。我们还引入了一组保形主动学习算法,利用视频数据的序列结构实现标注成本与性能之间更好的权衡,同时将浮点运算计算量降低至少77.67%。我们展示了这些方法如何通过序列选择框架更真实地反映视频标注的实践过程。我们进一步通过引入两个性能-成本指标证明了这些方法的优势,并表明最佳保形主动学习方法的成本比最佳传统主动学习方法节省113小时。