Retrieval-augmented generation (RAG) combines knowledge from domain-specific sources into large language models to ground answer generation. Current RAG systems lack customizable visibility on the context documents and the model's attentiveness towards such documents. We propose RAGViz, a RAG diagnosis tool that visualizes the attentiveness of the generated tokens in retrieved documents. With a built-in user interface, retrieval index, and Large Language Model (LLM) backbone, RAGViz provides two main functionalities: (1) token and document-level attention visualization, and (2) generation comparison upon context document addition and removal. As an open-source toolkit, RAGViz can be easily hosted with a custom embedding model and HuggingFace-supported LLM backbone. Using a hybrid ANN (Approximate Nearest Neighbor) index, memory-efficient LLM inference tool, and custom context snippet method, RAGViz operates efficiently with a median query time of about 5 seconds on a moderate GPU node. Our code is available at https://github.com/cxcscmu/RAGViz. A demo video of RAGViz can be found at https://youtu.be/cTAbuTu6ur4.
翻译:检索增强生成(RAG)通过将领域特定来源的知识整合到大型语言模型中,为答案生成提供依据。当前的RAG系统缺乏对上下文文档以及模型对这些文档关注度的可定制可视化能力。本文提出RAGViz,一种通过可视化生成词元在检索文档中关注度来实现RAG诊断的工具。该工具内置用户界面、检索索引和大型语言模型(LLM)主干,提供两大核心功能:(1)词元级与文档级注意力可视化;(2)基于上下文文档增删的生成结果对比。作为开源工具包,RAGViz可轻松部署自定义嵌入模型和HuggingFace支持的LLM主干。通过采用混合近似最近邻(ANN)索引、内存高效的LLM推理工具及定制上下文片段方法,RAGViz在中等GPU节点上实现中位查询时间约5秒的高效运行。代码发布于https://github.com/cxcscmu/RAGViz,演示视频可见https://youtu.be/cTAbuTu6ur4。