LeanContext: Cost-Efficient Domain-Specific Question Answering Using LLMs

Question-answering (QA) is a significant application of Large Language Models (LLMs), shaping chatbot capabilities across healthcare, education, and customer service. However, widespread LLM integration presents a challenge for small businesses due to the high expenses of LLM API usage. Costs rise rapidly when domain-specific data (context) is used alongside queries for accurate domain-specific LLM responses. One option is to summarize the context by using LLMs and reduce the context. However, this can also filter out useful information that is necessary to answer some domain-specific queries. In this paper, we shift from human-oriented summarizers to AI model-friendly summaries. Our approach, LeanContext, efficiently extracts $k$ key sentences from the context that are closely aligned with the query. The choice of $k$ is neither static nor random; we introduce a reinforcement learning technique that dynamically determines $k$ based on the query and context. The rest of the less important sentences are reduced using a free open source text reduction method. We evaluate LeanContext against several recent query-aware and query-unaware context reduction approaches on prominent datasets (arxiv papers and BBC news articles). Despite cost reductions of $37.29\%$ to $67.81\%$, LeanContext's ROUGE-1 score decreases only by $1.41\%$ to $2.65\%$ compared to a baseline that retains the entire context (no summarization). Additionally, if free pretrained LLM-based summarizers are used to reduce context (into human consumable summaries), LeanContext can further modify the reduced context to enhance the accuracy (ROUGE-1 score) by $13.22\%$ to $24.61\%$.

翻译：问答（QA）是大语言模型（LLMs）的重要应用，推动着医疗、教育和客户服务等领域的聊天机器人能力发展。然而，LLM的广泛集成对小企业构成挑战，原因在于LLM API的使用成本高昂。当将领域特定数据（上下文）与查询一同用于获取准确的领域特定LLM响应时，成本会迅速上升。一种选择是通过LLM对上下文进行摘要，以减少上下文规模，但这可能也会过滤掉回答某些领域特定查询所需的有用信息。本文中，我们从面向人类的摘要器转向面向AI模型的友好型摘要方法。我们的方法LeanContext能高效地从上下文中提取与查询紧密相关的$k$个关键句子。$k$的选择既非固定也非随机；我们引入一种强化学习技术，可根据查询和上下文动态确定$k$的值。其余重要性较低的句子则通过免费开源文本缩减方法进行压缩。我们在多个知名数据集（arXiv论文和BBC新闻文章）上将LeanContext与近期几种查询感知和查询无关的上下文压缩方法进行了评估。尽管成本降低了$37.29\%$至$67.81\%$，但与保持完整上下文（无摘要）的基线方法相比，LeanContext的ROUGE-1得分仅下降了$1.41\%$至$2.65\%$。此外，若使用免费的预训练LLM摘要器将上下文压缩为人类可理解的摘要，LeanContext可进一步修改压缩后的上下文，将准确率（ROUGE-1得分）提升$13.22\%$至$24.61\%$。