Case-based reasoning (CBR) is an experience-based approach to problem solving, where a repository of solved cases is adapted to solve new cases. Recent research shows that Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) can support the Retrieve and Reuse stages of the CBR pipeline by retrieving similar cases and using them as additional context to an LLM query. Most studies have focused on text-only applications, however, in many real-world problems the components of a case are multimodal. In this paper we present MCBR-RAG, a general RAG framework for multimodal CBR applications. The MCBR-RAG framework converts non-text case components into text-based representations, allowing it to: 1) learn application-specific latent representations that can be indexed for retrieval, and 2) enrich the query provided to the LLM by incorporating all case components for better context. We demonstrate MCBR-RAG's effectiveness through experiments conducted on a simplified Math-24 application and a more complex Backgammon application. Our empirical results show that MCBR-RAG improves generation quality compared to a baseline LLM with no contextual information provided.
翻译:案例推理是一种基于经验的问题解决方法,它通过调整已解决案例库来求解新案例。近期研究表明,具备检索增强生成能力的大语言模型能够支持案例推理流程中的检索与复用阶段:通过检索相似案例并将其作为大语言模型查询的附加上下文。然而,现有研究多集中于纯文本应用,而现实场景中的案例要素常具有多模态特性。本文提出MCBR-RAG——一个面向多模态案例推理应用的通用RAG框架。该框架将非文本案例要素转化为基于文本的表征,从而能够:1)学习适用于特定应用的潜在表征以建立可检索索引;2)通过整合所有案例要素来增强输入大语言模型的查询上下文。我们在简化版Math-24应用与更复杂的双陆棋应用上进行了实验验证,实证结果表明:相较于无上下文信息的基线大语言模型,MCBR-RAG显著提升了生成质量。