Answering query with semantic concepts has long been the mainstream approach for video search. Until recently, its performance is surpassed by concept-free approach, which embeds queries in a joint space as videos. Nevertheless, the embedded features as well as search results are not interpretable, hindering subsequent steps in video browsing and query reformulation. This paper integrates feature embedding and concept interpretation into a neural network for unified dual-task learning. In this way, an embedding is associated with a list of semantic concepts as an interpretation of video content. This paper empirically demonstrates that, by using either the embedding features or concepts, considerable search improvement is attainable on TRECVid benchmarked datasets. Concepts are not only effective in pruning false positive videos, but also highly complementary to concept-free search, leading to large margin of improvement compared to state-of-the-art approaches.
翻译:长期以来,利用语义概念回答查询一直是视频搜索的主流方法。直到最近,其性能被无概念方法超越,该方法将查询与视频共同嵌入到联合空间中。然而,嵌入特征及搜索结果不具有可解释性,这阻碍了视频浏览和查询重构中的后续步骤。本文通过将特征嵌入与概念解释整合到神经网络中,实现统一的双任务学习。通过这种方式,嵌入与一系列语义概念相关联,作为视频内容的解释。本文通过实验证明,在TRECVid基准数据集上,无论使用嵌入特征还是概念,均可显著提升搜索性能。概念不仅能够有效剔除假阳性视频,且与无概念搜索高度互补,相较于现有最优方法实现了大幅提升。