Discovering meaningful insights from a large dataset, known as Exploratory Data Analysis (EDA), is a challenging task that requires thorough exploration and analysis of the data. Automated Data Exploration (ADE) systems use goal-oriented methods with Large Language Models and Reinforcement Learning towards full automation. However, these methods require human involvement to anticipate goals that may limit insight extraction, while fully automated systems demand significant computational resources and retraining for new datasets. We introduce QUIS, a fully automated EDA system that operates in two stages: insight generation (ISGen) driven by question generation (QUGen). The QUGen module generates questions in iterations, refining them from previous iterations to enhance coverage without human intervention or manually curated examples. The ISGen module analyzes data to produce multiple relevant insights in response to each question, requiring no prior training and enabling QUIS to adapt to new datasets.
翻译:从大规模数据集中发现有意义洞察的过程,即探索性数据分析(EDA),是一项需要通过深入探索与分析数据才能完成的挑战性任务。自动化数据探索(ADE)系统采用面向目标的方法,结合大型语言模型与强化学习技术,致力于实现全自动化分析。然而,这些方法需要人工参与以预设目标,这可能限制洞察的提取;而完全自动化的系统则需消耗大量计算资源,并在面对新数据集时需重新训练。本文提出QUIS——一个完全自动化的EDA系统,其运行分为两个阶段:由问题生成(QUGen)驱动的洞察生成(ISGen)。QUGen模块通过迭代方式生成问题,并基于前序迭代结果进行优化,在无需人工干预或手动构建示例的情况下提升问题覆盖度。ISGen模块则通过分析数据,针对每个问题生成多个相关洞察,该过程无需预先训练,使得QUIS能够快速适应新的数据集。