As large language models (LLMs) have demonstrated their powerful capabilities in plenty of domains and tasks, including context understanding, code generation, language generation, data storytelling, etc., many data analysts may raise concerns if their jobs will be replaced by AI. This controversial topic has drawn a lot of attention in public. However, we are still at a stage of divergent opinions without any definitive conclusion. Motivated by this, we raise the research question of "is GPT-4 a good data analyst?" in this work and aim to answer it by conducting head-to-head comparative studies. In detail, we regard GPT-4 as a data analyst to perform end-to-end data analysis with databases from a wide range of domains. We propose a framework to tackle the problems by carefully designing the prompts for GPT-4 to conduct experiments. We also design several task-specific evaluation metrics to systematically compare the performance between several professional human data analysts and GPT-4. Experimental results show that GPT-4 can achieve comparable performance to humans. We also provide in-depth discussions about our results to shed light on further studies before we reach the conclusion that GPT-4 can replace data analysts.
翻译:随着大型语言模型(LLMs)在诸多领域和任务中展现出强大能力,包括上下文理解、代码生成、语言生成、数据叙事等,许多数据分析师可能会担忧他们的工作是否会被人工智能取代。这一争议性话题在公众中引起了广泛关注。然而,我们仍处于观点纷纭的阶段,尚无定论。基于此,本研究提出“GPT-4 是一名优秀的数据分析师吗?”这一研究问题,并旨在通过直接的对比研究来回答。具体而言,我们将 GPT-4 视为一名数据分析师,使其对来自多个领域的数据库执行端到端的数据分析。我们设计了一个框架,通过精心构造提示词(prompts)来指导 GPT-4 进行实验,并制定了几项任务特定的评估指标,以系统比较多名专业人类数据分析师与 GPT-4 的表现。实验结果表明,GPT-4 能够达到与人类相当的性能。我们还对结果进行了深入讨论,为后续研究提供启示,以期在得出“GPT-4 可取代数据分析师”这一结论之前进一步探索。