Relational database-driven data analysis (RDB-DA) report generation, which aims to generate data analysis reports after querying relational databases, has been widely applied in fields such as finance and healthcare. Typically, these tasks are manually completed by data scientists, making the process very labor-intensive and showing a clear need for automation. Although existing methods (e.g., Table QA or Text-to-SQL) have been proposed to reduce human dependency, they cannot handle complex analytical tasks that require multi-step reasoning, cross-table associations, and synthesizing insights into reports. Moreover, there is no dataset available for developing automatic RDB-DA report generation. To fill this gap, this paper proposes an LLM agent system for RDB-DA report generation tasks, dubbed DAgent; moreover, we construct a benchmark for automatic data analysis report generation, which includes a new dataset DA-Dataset and evaluation metrics. DAgent integrates planning, tools, and memory modules to decompose natural language questions into logically independent sub-queries, accurately retrieve key information from relational databases, and generate analytical reports that meet the requirements of completeness, correctness, and conciseness through multi-step reasoning and effective data integration. Experimental analysis on the DA-Dataset demonstrates that DAgent's superiority in retrieval performance and analysis report generation quality, showcasing its strong potential for tackling complex database analysis report generation tasks.
翻译:基于关系数据库的数据分析报告生成旨在通过查询关系数据库后生成数据分析报告,该技术已广泛应用于金融和医疗等领域。此类任务通常由数据科学家手动完成,过程极其耗费人力,存在明显的自动化需求。尽管现有方法(如表问答或文本到SQL转换)已被提出以降低人工依赖,但它们无法处理需要多步推理、跨表关联及将洞察综合成报告的复杂分析任务。此外,目前缺乏可用于开发自动化RDB-DA报告生成的数据集。为填补这一空白,本文提出了一种用于RDB-DA报告生成任务的LLM智能体系统DAgent;同时构建了自动化数据分析报告生成的基准测试,包含新数据集DA-Dataset及评估指标。DAgent整合了规划、工具与记忆模块,能够将自然语言问题分解为逻辑独立的子查询,从关系数据库中精准检索关键信息,并通过多步推理与有效数据整合生成满足完整性、正确性与简洁性要求的分析报告。在DA-Dataset上的实验分析表明,DAgent在检索性能与分析报告生成质量方面具有显著优势,展现了其处理复杂数据库分析报告生成任务的强大潜力。