Retrieval-augmented Large Language Models (LLMs) have reshaped traditional query-answering systems, offering unparalleled user experiences. However, existing retrieval techniques often struggle to handle multi-modal query contexts. In this paper, we present an interactive Multi-modal Query Answering (MQA) system, empowered by our newly developed multi-modal retrieval framework and navigation graph index, integrated with cutting-edge LLMs. It comprises five core components: Data Preprocessing, Vector Representation, Index Construction, Query Execution, and Answer Generation, all orchestrated by a dedicated coordinator to ensure smooth data flow from input to answer generation. One notable aspect of MQA is its utilization of contrastive learning to assess the significance of different modalities, facilitating precise measurement of multi-modal information similarity. Furthermore, the system achieves efficient retrieval through our advanced navigation graph index, refined using computational pruning techniques. Another highlight of our system is its pluggable processing framework, allowing seamless integration of embedding models, graph indexes, and LLMs. This flexibility provides users diverse options for gaining insights from their multi-modal knowledge base. A preliminary video introduction of MQA is available at https://youtu.be/xvUuo2ZIqWk.
翻译:检索增强大语言模型(LLMs)重塑了传统问答系统,提供了无与伦比的用户体验。然而,现有检索技术往往难以处理多模态查询语境。本文提出一种交互式多模态问答(MQA)系统,该系统由我们新开发的多模态检索框架和导航图索引赋能,并与前沿大语言模型集成。该系统包含五个核心组件:数据预处理、向量表示、索引构建、查询执行与答案生成,所有组件均由专用协调器统一调度,确保从输入到答案生成的数据流顺畅运行。MQA系统的一个显著特点是利用对比学习来评估不同模态的重要性,从而实现对多模态信息相似性的精准度量。此外,系统通过我们先进的导航图索引实现高效检索,该索引采用计算剪枝技术进行了优化。本系统的另一亮点是其可插拔处理框架,能够无缝集成嵌入模型、图索引与大语言模型。这种灵活性为用户从多模态知识库中获取洞察提供了多样化选择。MQA系统的初步视频介绍可在 https://youtu.be/xvUuo2ZIqWk 查看。