Modern video segmentation methods adopt object queries to perform inter-frame association and demonstrate satisfactory performance in tracking continuously appearing objects despite large-scale motion and transient occlusion. However, they all underperform on newly emerging and disappearing objects that are common in the real world because they attempt to model object emergence and disappearance through feature transitions between background and foreground queries that have significant feature gaps. We introduce Dynamic Anchor Queries (DAQ) to shorten the transition gap between the anchor and target queries by dynamically generating anchor queries based on the features of potential candidates. Furthermore, we introduce a query-level object Emergence and Disappearance Simulation (EDS) strategy, which unleashes DAQ's potential without any additional cost. Finally, we combine our proposed DAQ and EDS with DVIS to obtain DVIS-DAQ. Extensive experiments demonstrate that DVIS-DAQ achieves a new state-of-the-art (SOTA) performance on five mainstream video segmentation benchmarks. Code and models are available at \url{https://github.com/SkyworkAI/DAQ-VS}.
翻译:现代视频分割方法采用对象查询进行帧间关联,并在跟踪持续出现的对象方面展现出令人满意的性能,尽管存在大规模运动和瞬时遮挡。然而,这些方法在处理现实世界中常见的新出现和消失对象时均表现不佳,因为它们试图通过背景查询与前景查询之间的特征转换来建模对象的出现与消失,而这两类查询存在显著的特征差距。我们引入了动态锚点查询(DAQ),通过基于潜在候选特征动态生成锚点查询,以缩短锚点查询与目标查询之间的转换差距。此外,我们提出了一种查询级别的对象出现与消失模拟(EDS)策略,该策略无需任何额外成本即可释放DAQ的潜力。最后,我们将提出的DAQ和EDS与DVIS结合,得到了DVIS-DAQ。大量实验表明,DVIS-DAQ在五个主流视频分割基准测试中实现了新的最先进(SOTA)性能。代码和模型可在 \url{https://github.com/SkyworkAI/DAQ-VS} 获取。