Recent years have witnessed an increasing amount of dialogue/conversation on the web especially on social media. That inspires the development of dialogue-based retrieval, in which retrieving videos based on dialogue is of increasing interest for recommendation systems. Different from other video retrieval tasks, dialogue-to-video retrieval uses structured queries in the form of user-generated dialogue as the search descriptor. We present a novel dialogue-to-video retrieval system, incorporating structured conversational information. Experiments conducted on the AVSD dataset show that our proposed approach using plain-text queries improves over the previous counterpart model by 15.8% on R@1. Furthermore, our approach using dialogue as a query, improves retrieval performance by 4.2%, 6.2%, 8.6% on R@1, R@5 and R@10 and outperforms the state-of-the-art model by 0.7%, 3.6% and 6.0% on R@1, R@5 and R@10 respectively.
翻译:近年来,网络上尤其是社交媒体中的对话/交流数量显著增长。这推动了基于对话的检索技术发展,其中基于对话的视频检索在推荐系统中日益受到关注。与其他视频检索任务不同,对话到视频检索采用用户生成对话形式的结构化查询作为搜索描述符。我们提出了一种融合结构化对话信息的新型对话到视频检索系统。在AVSD数据集上的实验表明,采用纯文本查询的所提方法较先前对照模型在R@1指标上提升了15.8%。此外,使用对话作为查询时,本方法在R@1、R@5和R@10指标上分别提升了4.2%、6.2%和8.6%,并在R@1、R@5和R@10指标上分别以0.7%、3.6%和6.0%的优势超越了当前最优模型。