The SQL-based exploratory data analysis has garnered significant attention within the data analysis community. The emergence of large language models (LLMs) has facilitated the paradigm shift from manual to automated data exploration. However, existing methods generally lack the ability for cross-domain analysis, and the exploration of LLMs capabilities remains insufficient. This paper presents TiInsight, an SQL-based automated cross-domain exploratory data analysis system. First, TiInsight offers a user-friendly GUI enabling users to explore data using natural language queries. Second, TiInsight offers a robust cross-domain exploratory data analysis pipeline: hierarchical data context (i.e., HDC) generation, question clarification and decomposition, text-to-SQL (i.e., TiSQL), and data visualization (i.e., TiChart). Third, we have implemented and deployed TiInsight in the production environment of PingCAP and demonstrated its capabilities using representative datasets. The demo video is available at https://youtu.be/JzYFyYd-emI.
翻译:基于SQL的探索性数据分析在数据分析领域已获得广泛关注。大型语言模型(LLMs)的出现推动了从人工数据探索到自动化数据探索的范式转变。然而,现有方法普遍缺乏跨领域分析能力,且对LLMs潜能的探索仍不充分。本文提出TiInsight,一个基于SQL的自动化跨领域探索性数据分析系统。首先,TiInsight提供友好的图形用户界面,允许用户通过自然语言查询探索数据。其次,TiInsight构建了鲁棒的跨领域探索性数据分析流程:分层数据上下文(即HDC)生成、问题澄清与分解、文本到SQL转换(即TiSQL)以及数据可视化(即TiChart)。最后,我们已在PingCAP生产环境中实现并部署TiInsight,并利用代表性数据集验证了其功能。演示视频详见 https://youtu.be/JzYFyYd-emI。