Exploratory Data Analysis (EDA) is an essential yet tedious process for examining a new dataset. To facilitate it, natural language interfaces (NLIs) can help people intuitively explore the dataset via data-oriented questions. However, existing NLIs primarily focus on providing accurate answers to questions, with few offering explanations or presentations of the data analysis pipeline used to uncover the answer. Such presentations are crucial for EDA as they enhance the interpretability and reliability of the answer, while also helping users understand the analysis process and derive insights. To fill this gap, we introduce Urania, a natural language interactive system that is able to visualize the data analysis pipelines used to resolve input questions. It integrates a natural language interface that allows users to explore data via questions, and a novel data-aware question decomposition algorithm that resolves each input question into a data analysis pipeline. This pipeline is visualized in the form of a datamation, with animated presentations of analysis operations and their corresponding data changes. Through two quantitative experiments and expert interviews, we demonstrated that our data-aware question decomposition algorithm outperforms the state-of-the-art technique in terms of execution accuracy, and that Urania can help people explore datasets better. In the end, we discuss the observations from the studies and the potential future works.
翻译:探索性数据分析(EDA)是检查新数据集时必要但繁琐的过程。为简化这一过程,自然语言界面(NLIs)可通过面向数据的问题帮助用户直观地探索数据集。然而,现有NLIs主要侧重于提供准确的问题答案,鲜有对用于揭示答案的数据分析管道进行解释或展示。这类展示对EDA至关重要,既能增强答案的可解释性与可靠性,又能帮助用户理解分析过程并获取洞见。为弥补这一不足,我们提出Urania——一种能够可视化用于解决输入问题之数据分析管道的自然语言交互系统。该系统集成了允许用户通过问题探索数据的自然语言界面,以及一种新颖的数据感知问题分解算法,该算法可将每个输入问题解析为数据分析管道。该管道以数据动画(datamation)形式呈现,包含分析操作及其对应数据变化的动态展示。通过两项定量实验和专家访谈,我们证明了所提数据感知问题分解算法在执行精度上优于现有技术,且Urania能有效帮助用户更好地探索数据集。最后,我们讨论了研究中观察到的现象及未来潜在工作方向。