This work introduces a dataset, benchmark, and challenge for the problem of video copy detection and localization. The problem comprises two distinct but related tasks: determining whether a query video shares content with a reference video ("detection"), and additionally temporally localizing the shared content within each video ("localization"). The benchmark is designed to evaluate methods on these two tasks, and simulates a realistic needle-in-haystack setting, where the majority of both query and reference videos are "distractors" containing no copied content. We propose a metric that reflects both detection and localization accuracy. The associated challenge consists of two corresponding tracks, each with restrictions that reflect real-world settings. We provide implementation code for evaluation and baselines. We also analyze the results and methods of the top submissions to the challenge. The dataset, baseline methods and evaluation code is publicly available and will be discussed at a dedicated CVPR'23 workshop.
翻译:本工作提出了一个针对视频复制检测与定位问题的数据集、基准测试及挑战赛。该问题包含两个相关但不同的任务:判断查询视频是否与参考视频存在内容复用(“检测”),以及进一步在每段视频中定位共享内容的起止时间(“定位”)。基准测试旨在评估方法在这两个任务上的表现,并模拟了现实的“大海捞针”场景——大多数查询视频和参考视频均为不含复制内容的“干扰项”。我们提出了一种能够同时反映检测与定位精度的评估指标。相关挑战赛分为两个赛道,每个赛道均设有反映现实场景的限制条件。我们提供评估与基线方法的实现代码,并对顶尖参赛方案的结果与创新方法进行了分析。该数据集、基线方法及评估代码已公开,并将在CVPR'23专题研讨会上进行讨论。