Data analysis is challenging as it requires synthesizing domain knowledge, statistical expertise, and programming skills. Assistants powered by large language models (LLMs), such as ChatGPT, can assist analysts by translating natural language instructions into code. However, AI-assistant responses and analysis code can be misaligned with the analyst's intent or be seemingly correct but lead to incorrect conclusions. Therefore, validating AI assistance is crucial and challenging. Here, we explore how analysts understand and verify the correctness of AI-generated analyses. To observe analysts in diverse verification approaches, we develop a design probe equipped with natural language explanations, code, visualizations, and interactive data tables with common data operations. Through a qualitative user study (n=22) using this probe, we uncover common behaviors within verification workflows and how analysts' programming, analysis, and tool backgrounds reflect these behaviors. Additionally, we provide recommendations for analysts and highlight opportunities for designers to improve future AI-assistant experiences.
翻译:数据分析是一项具有挑战性的任务,需要综合领域知识、统计学专业知识和编程技能。基于大型语言模型(LLM)的辅助工具(如ChatGPT)能够通过将自然语言指令转化为代码来协助分析师。然而,AI助手的响应和分析代码可能与分析师的意图不一致,或看似正确却导致错误结论。因此,验证AI辅助的可靠性至关重要且充满挑战。本研究探索了分析师如何理解并验证AI生成分析的正确性。为观察分析师采用不同验证方法的过程,我们设计了一个配备自然语言解释、代码、可视化图表以及支持常见数据操作的交互式数据表的研究原型。通过使用该原型进行的定性用户研究(n=22),我们揭示了验证工作流程中的常见行为模式,以及分析师在编程、分析和工具方面的背景如何反映这些行为。此外,我们为分析师提供了相关建议,并指出了设计师改进未来AI助手体验的机会。