Known-item video search is effective with human-in-the-loop to interactively investigate the search result and refine the initial query. Nevertheless, when the first few pages of results are swamped with visually similar items, or the search target is hidden deep in the ranked list, finding the know-item target usually requires a long duration of browsing and result inspection. This paper tackles the problem by reinforcement learning, aiming to reach a search target within a few rounds of interaction by long-term learning from user feedbacks. Specifically, the system interactively plans for navigation path based on feedback and recommends a potential target that maximizes the long-term reward for user comment. We conduct experiments for the challenging task of video corpus moment retrieval (VCMR) to localize moments from a large video corpus. The experimental results on TVR and DiDeMo datasets verify that our proposed work is effective in retrieving the moments that are hidden deep inside the ranked lists of CONQUER and HERO, which are the state-of-the-art auto-search engines for VCMR.
翻译:已知项视频搜索通过引入人工参与交互式地检查搜索结果并优化初始查询,是一种有效的方法。然而,当搜索结果的前几页充斥着视觉相似项,或搜索目标深藏在排序列表中时,找到已知项目标通常需要长时间的浏览和结果检查。本文通过强化学习来解决这一问题,旨在通过从用户反馈中进行长期学习,在少数几次交互内达到搜索目标。具体而言,系统基于反馈交互式地规划导航路径,并推荐一个能最大化用户评论长期收益的潜在目标。针对视频语料时刻检索(VCMR)这一具有挑战性的任务,我们进行了实验,以从大型视频语料库中定位时刻。在TVR和DiDeMo数据集上的实验结果表明,我们提出的方法在检索深藏在CONQUER和HERO(VCMR的最先进自动搜索引擎)排序列表中的时刻方面是有效的。