Shot Boundary Detection (SBD) aims to automatically identify shot changes and divide a video into coherent shots. While SBD was widely studied in the literature, existing state-of-the-art methods often produce non-interpretable boundaries on transitions, miss subtle yet harmful discontinuities, and rely on noisy, low-diversity annotations and outdated benchmarks. To alleviate these limitations, we propose OmniShotCut to formulate SBD as structured relational prediction, jointly estimating shot ranges with intra-shot relations and inter-shot relations, by a shot query-based dense video Transformer. To avoid imprecise manual labeling, we adopt a fully synthetic transition synthesis pipeline that automatically reproduces major transition families with precise boundaries and parameterized variants. We also introduce OmniShotCutBench, a modern wide-domain benchmark enabling holistic and diagnostic evaluation.
翻译:镜头边界检测(SBD)旨在自动识别镜头切换并将视频分割为连贯的镜头。尽管SBD在文献中已被广泛研究,但现有最先进方法常生成不可解释的过渡边界、遗漏细微但有害的不连续性,并依赖噪声大、多样性低的人工标注及过时的基准。为缓解上述局限,我们提出OmniShotCut,通过基于镜头查询的密集视频Transformer,将SBD建模为结构化关系预测,联合估计镜头范围及其内部关系与间关系。为避免不精确的人工标注,我们采用全合成过渡生成管线,自动复现主要过渡类型及其精确边界与参数化变体。我们还引入OmniShotCutBench,一个现代广域基准,支持全景式与诊断性评估。