This paper presents an empirical evaluation of the performance of the Generative Pre-trained Transformer (GPT) model in Harvard's CS171 data visualization course. While previous studies have focused on GPT's ability to generate code for visualizations, this study goes beyond code generation to evaluate GPT's abilities in various visualization tasks, such as data interpretation, visualization design, visual data exploration, and insight communication. The evaluation utilized GPT-3.5 and GPT-4 to complete assignments of CS171, and included a quantitative assessment based on the established course rubrics, a qualitative analysis informed by the feedback of three experienced graders, and an exploratory study of GPT's capabilities in completing border visualization tasks. Findings show that GPT-4 scored 80% on quizzes and homework, and TFs could distinguish between GPT- and human-generated homework with 70% accuracy. The study also demonstrates GPT's potential in completing various visualization tasks, such as data cleanup, interaction with visualizations, and insight communication. The paper concludes by discussing the strengths and limitations of GPT in data visualization, potential avenues for incorporating GPT in broader visualization tasks, and the need to redesign visualization education.
翻译:本文对生成式预训练Transformer(GPT)模型在哈佛大学CS171数据可视化课程中的表现进行了实证评估。尽管先前研究聚焦于GPT生成可视化代码的能力,但本研究超越代码生成,进一步评估了GPT在数据解读、可视化设计、视觉数据探索及洞察传达等多类可视化任务中的能力。评估采用GPT-3.5和GPT-4完成CS171课程作业,包含基于课程既定评分标准的量化评估、结合三位经验丰富评分员反馈的质性分析,以及对GPT完成更广泛可视化任务能力的探索性研究。结果表明,GPT-4在随堂测验和作业中得分率达80%,助教能以70%准确率区分GPT生成与人类完成的作业。研究同时展示了GPT在数据清洗、可视化交互及洞察传达等多样化可视化任务中的潜力。本文最后讨论了GPT在数据可视化中的优势与局限,将其融入更广泛可视化任务的可能途径,以及重新设计可视化教育的必要性。