DataLab: A Unified Platform for LLM-Powered Business Intelligence

Luoxuan Weng,Yinghao Tang,Yingchaojie Feng,Zhuo Chang,Peng Chen,Ruiqin Chen,Haozhe Feng,Chen Hou,Danqing Huang,Yang Li,Huaming Rao,Haonan Wang,Canshi Wei,Xiaofeng Yang,Yuhui Zhang,Yifeng Zheng,Xiuqi Huang,Minfeng Zhu,Yuxin Ma,Bin Cui,Wei Chen

Business intelligence (BI) transforms large volumes of data within modern organizations into actionable insights for informed decision-making. Recently, large language model (LLM)-based agents have streamlined the BI workflow by automatically performing task planning, reasoning, and actions in executable environments based on natural language (NL) queries. However, existing approaches primarily focus on individual BI tasks such as NL2SQL and NL2VIS. The fragmentation of tasks across different data roles and tools lead to inefficiencies and potential errors due to the iterative and collaborative nature of BI. In this paper, we introduce DataLab, a unified BI platform that integrates a one-stop LLM-based agent framework with an augmented computational notebook interface. DataLab supports a wide range of BI tasks for different data roles by seamlessly combining LLM assistance with user customization within a single environment. To achieve this unification, we design a domain knowledge incorporation module tailored for enterprise-specific BI tasks, an inter-agent communication mechanism to facilitate information sharing across the BI workflow, and a cell-based context management strategy to enhance context utilization efficiency in BI notebooks. Extensive experiments demonstrate that DataLab achieves state-of-the-art performance on various BI tasks across popular research benchmarks. Moreover, DataLab maintains high effectiveness and efficiency on real-world datasets from Tencent, achieving up to a 58.58% increase in accuracy and a 61.65% reduction in token cost on enterprise-specific BI tasks.

翻译：商业智能（BI）将现代组织中的海量数据转化为可操作的洞见，以支持基于信息的决策制定。近年来，基于大语言模型（LLM）的智能体通过基于自然语言（NL）查询在可执行环境中自动执行任务规划、推理与操作，从而简化了BI工作流。然而，现有方法主要聚焦于诸如NL2SQL和NL2VIS等独立的BI任务。由于BI工作本身具有迭代性与协作性，不同数据角色与工具之间的任务割裂导致了效率低下和潜在的误差。本文中，我们介绍了DataLab，一个统一的BI平台，它将一站式的基于LLM的智能体框架与增强的计算笔记本界面相集成。DataLab通过在一个单一环境中无缝结合LLM辅助与用户自定义，为不同的数据角色支持广泛的BI任务。为实现这种统一性，我们设计了一个针对企业特定BI任务定制的领域知识融合模块、一个促进BI工作流中跨智能体信息共享的智能体间通信机制，以及一个基于单元格的上下文管理策略，以提升BI笔记本中的上下文利用效率。大量实验表明，DataLab在多个主流研究基准测试的各种BI任务上均达到了最先进的性能。此外，DataLab在来自腾讯的真实世界数据集上保持了高有效性与高效率，在企业特定BI任务上实现了高达58.58%的准确率提升和61.65%的令牌成本降低。