We propose a novel approach for modeling semantic contextual relationships in videos. This graph-based model enables the learning and propagation of higher-level spatial-temporal contexts to facilitate the semantic labeling of local regions. We introduce an exemplar-based nonparametric view of contextual cues, where the inherent relationships implied by object hypotheses are encoded on a similarity graph of regions. Contextual relationships learning and propagation are performed to estimate the pairwise contexts between all pairs of unlabeled local regions. Our algorithm integrates the learned contexts into a Conditional Random Field (CRF) in the form of pairwise potentials and infers the per-region semantic labels. We evaluate our approach on the challenging YouTube-Objects dataset which shows that the proposed contextual relationship model outperforms the state-of-the-art methods.
翻译:我们提出了一种新颖的视频语义上下文关系建模方法。该基于图的模型能够学习并传播更高层次的时空上下文信息,从而促进局部区域的语义标注。我们引入了一种基于范例的非参数化上下文线索视角,其中由对象假设所隐含的内在关系被编码在区域相似度图上。通过执行上下文关系学习与传播,我们估计所有未标注局部区域对之间的成对上下文关系。我们的算法将学习到的上下文关系以成对势能的形式整合到条件随机场(CRF)中,并推断每个区域的语义标签。我们在具有挑战性的YouTube-Objects数据集上评估了所提出的方法,结果表明我们的上下文关系模型性能优于当前最先进的方法。