The Multimodal Video Search by Examples (MVSE) project investigates using video clips as the query term for information retrieval, rather than the more traditional text query. This enables far richer search modalities such as images, speaker, content, topic, and emotion. A key element for this process is highly rapid, flexible, search to support large archives, which in MVSE is facilitated by representing video attributes by embeddings. This work aims to mitigate any performance loss from this rapid archive search by examining reranking approaches. In particular, zero-shot reranking methods using large language models are investigated as these are applicable to any video archive audio content. Performance is evaluated for topic-based retrieval on a publicly available video archive, the BBC Rewind corpus. Results demonstrate that reranking can achieve improved retrieval ranking without the need for any task-specific training data.
翻译:多模态视频示例搜索(MVSE)项目研究使用视频片段作为信息检索的查询项,而非传统的文本查询。这种方式能够实现更丰富的检索模式,如图像、说话人、内容、主题及情感。该过程的关键要素是支持海量档案的快速、灵活搜索,而MVSE通过嵌入向量表示视频属性来实现这一目标。本研究旨在通过考察重排序方法来缓解快速档案搜索带来的性能损失。具体而言,本文探究了基于大语言模型的零样本重排序方法,这些方法可适用于任何视频档案的音频内容。我们在公开视频档案(BBC Rewind语料库)上针对主题检索任务评估性能。实验结果表明,重排序可在无需任何任务特定训练数据的情况下改善检索排序质量。