In this paper, we focus on methods to reduce the size and improve the quality of the prompt context required for question-answering systems. Attempts to increase the number of retrieved chunked documents and thereby enlarge the context related to the query can significantly complicate the processing and decrease the performance of a Large Language Model (LLM) when generating responses to queries. It is well known that a large set of documents retrieved from a database in response to a query may contain irrelevant information, which often leads to hallucinations in the resulting answers. Our goal is to select the most semantically relevant documents, treating the discarded ones as outliers. We propose and evaluate several methods for identifying outliers by creating features that utilize the distances of embedding vectors, retrieved from the vector database, to both the centroid and the query vectors. The methods were evaluated by comparing the similarities of the retrieved LLM responses to ground-truth answers obtained using the OpenAI GPT-4o model. It was found that the greatest improvements were achieved with increasing complexity of the questions and answers.
翻译:本文聚焦于减少问答系统所需提示上下文规模并提升其质量的方法。增加检索到的分块文档数量以扩大与查询相关上下文的尝试,会显著增加大型语言模型在处理查询并生成响应时的复杂性并降低其性能。众所周知,从数据库检索到的大量文档集合可能包含不相关信息,这常常导致最终答案产生幻觉。我们的目标是通过将丢弃的文档视为离群点,筛选出语义最相关的文档。我们提出并评估了多种离群点识别方法,这些方法通过创建特征来利用从向量数据库检索到的嵌入向量到质心向量和查询向量的距离。通过比较检索到的LLM响应与使用OpenAI GPT-4o模型获得的真实答案之间的相似性,对这些方法进行了评估。研究发现,随着问题与答案复杂度的增加,所获得的改进效果最为显著。