Modern video segmentation methods adopt object queries to perform inter-frame association and demonstrate satisfactory performance in tracking continuously appearing objects despite large-scale motion and transient occlusion. However, they all underperform on newly emerging and disappearing objects that are common in the real world because they attempt to model object emergence and disappearance through feature transitions between background and foreground queries that have significant feature gaps. We introduce Dynamic Anchor Queries (DAQ) to shorten the transition gap between the anchor and target queries by dynamically generating anchor queries based on the features of potential candidates. Furthermore, we introduce a query-level object Emergence and Disappearance Simulation (EDS) strategy, which unleashes DAQ's potential without any additional cost. Finally, we combine our proposed DAQ and EDS with DVIS to obtain DVIS-DAQ. Extensive experiments demonstrate that DVIS-DAQ achieves a new state-of-the-art (SOTA) performance on five mainstream video segmentation benchmarks. Code and models are available at \url{https://github.com/SkyworkAI/DAQ-VS}.
翻译:现代视频分割方法采用对象查询执行帧间关联,并在处理大尺度运动和瞬时遮挡时对持续出现的对象展现出令人满意的跟踪性能。然而,所有现有方法在处理现实世界中普遍存在的新出现和消失对象时表现欠佳,因为它们试图通过背景查询与前景查询(二者存在显著特征差距)之间的特征过渡来建模对象的出现与消失。我们引入动态锚点查询(DAQ),通过基于潜在候选对象的特征动态生成锚点查询,从而缩短锚点查询与目标查询之间的过渡差距。此外,我们提出查询级对象出现与消失模拟(EDS)策略,无需任何额外成本即可释放DAQ的潜力。最后,我们将提出的DAQ和EDS与DVIS结合,得到DVIS-DAQ。大量实验表明,DVIS-DAQ在五个主流视频分割基准上达到了新的最先进(SOTA)性能。代码和模型可在 \url{https://github.com/SkyworkAI/DAQ-VS} 获取。