In this paper, we conduct a study to utilize LLMs as a solution for decision making that requires complex data analysis. We define Decision QA as the task of answering the best decision, $d_{best}$, for a decision-making question $Q$, business rules $R$ and a database $D$. Since there is no benchmark that can examine Decision QA, we propose Decision QA benchmark, DQA. It has two scenarios, Locating and Building, constructed from two video games (Europa Universalis IV and Victoria 3) that have almost the same goal as Decision QA. To address Decision QA effectively, we also propose a new RAG technique called the iterative plan-then-retrieval augmented generation (PlanRAG). Our PlanRAG-based LM generates the plan for decision making as the first step, and the retriever generates the queries for data analysis as the second step. The proposed method outperforms the state-of-the-art iterative RAG method by 15.8% in the Locating scenario and by 7.4% in the Building scenario, respectively. We release our code and benchmark at https://github.com/myeon9h/PlanRAG.
翻译:本文研究利用大语言模型作为需要复杂数据分析的决策任务的解决方案。我们将决策问答定义为针对决策问题$Q$、业务规则$R$和数据库$D$给出最优决策$d_{best}$的任务。由于现有基准无法检验决策问答能力,我们提出了决策问答基准DQA。该基准包含定位与建设两种场景,构建自两款与决策问答目标高度一致的游戏(《欧陆风云IV》与《维多利亚3》)。为有效解决决策问答问题,我们提出了一种称为迭代式先规划后检索增强生成的新RAG技术。基于PlanRAG的语言模型首先生成决策规划,随后检索器生成数据分析查询。所提方法在定位场景中优于当前最优迭代RAG方法15.8%,在建设场景中优于7.4%。代码与基准已发布于https://github.com/myeon9h/PlanRAG。