Constructing supervised machine learning models for real-world video analysis require substantial labeled data, which is costly to acquire due to scarce domain expertise and laborious manual inspection. While data programming shows promise in generating labeled data at scale with user-defined labeling functions, the high dimensional and complex temporal information in videos poses additional challenges for effectively composing and evaluating labeling functions. In this paper, we propose VideoPro, a visual analytics approach to support flexible and scalable video data programming for model steering with reduced human effort. We first extract human-understandable events from videos using computer vision techniques and treat them as atomic components of labeling functions. We further propose a two-stage template mining algorithm that characterizes the sequential patterns of these events to serve as labeling function templates for efficient data labeling. The visual interface of VideoPro facilitates multifaceted exploration, examination, and application of the labeling templates, allowing for effective programming of video data at scale. Moreover, users can monitor the impact of programming on model performance and make informed adjustments during the iterative programming process. We demonstrate the efficiency and effectiveness of our approach with two case studies and expert interviews.
翻译:针对真实世界视频分析构建监督式机器学习模型需要大量标注数据,由于缺乏领域专业知识且人工核查耗时费力,获取此类数据的成本极为高昂。虽然数据编程方法可通过用户定义的标注函数实现规模化标注数据生成,但视频中高维度、复杂的时序信息为高效构建和评估标注函数带来了额外挑战。本文提出VideoPro这一可视分析方法,旨在支持灵活可扩展的视频数据编程,以降低人工干预实现模型调控。我们首先利用计算机视觉技术从视频中提取人类可理解的事件,并将其作为标注函数的原子组件。进而提出两阶段模板挖掘算法,通过表征事件的序列模式生成标注函数模板,实现高效数据标注。VideoPro的可视化界面支持对标注模板进行多维度探索、审查与应用,从而高效实现大规模视频数据编程。此外,用户可在迭代编程过程中监控数据编程对模型性能的影响,并据此进行合理调整。通过两项案例研究与专家访谈,我们验证了该方法的效率与有效性。