Feature similarity matching, which transfers the information of the reference frame to the query frame, is a key component in semi-supervised video object segmentation. If surjective matching is adopted, background distractors can easily occur and degrade the performance. Bijective matching mechanisms try to prevent this by restricting the amount of information being transferred to the query frame, but have two limitations: 1) surjective matching cannot be fully leveraged as it is transformed to bijective matching at test time; and 2) test-time manual tuning is required for searching the optimal hyper-parameters. To overcome these limitations while ensuring reliable information transfer, we introduce an equalized matching mechanism. To prevent the reference frame information from being overly referenced, the potential contribution to the query frame is equalized by simply applying a softmax operation along with the query. On public benchmark datasets, our proposed approach achieves a comparable performance to state-of-the-art methods.
翻译:特征相似性匹配是半监督视频对象分割中的关键组成部分,它将参考帧的信息传递到查询帧。如果采用满射匹配,背景干扰物容易产生并降低性能。双射匹配机制试图通过限制传递给查询帧的信息量来防止这种情况,但存在两个局限:1)满射匹配无法被充分利用,因为在测试时它被转换为双射匹配;2)需要在测试时手动调参以搜索最优超参数。为了克服这些局限并确保可靠的信息传递,我们引入了一种均衡匹配机制。为了防止参考帧信息被过度参考,通过沿查询维度简单应用softmax操作来均衡其对查询帧的潜在贡献。在公开基准数据集上,我们提出的方法达到了与最先进方法相当的性能。