Generative AI (GAI) reveals an irreducible human core at the center of data science: advances in GAI should sharpen, rather than diminish, the focus on human reasoning in data science education. GAI can now execute many routine data science workflows, including cleaning, summarizing, visualizing, modeling, and drafting reports. Yet the competencies that matter most remain irreducibly human: problem formulation, measurement and design, causal identification, statistical and computational reasoning, ethics and accountability, and sensemaking. Drawing on Donoho's Greater Data Science framework, Nolan and Temple Lang's vision of computational literacy, and the McLuhan-Culkin insight that we shape our tools and thereafter our tools shape us, this paper traces the emergence of data science through three converging lineages: Tukey's intellectual vision of data analysis as a science, the commercial logic of surveillance capitalism that created industrial demand for data scientists, and the academic programs that followed. Mapping GAI's impact onto Donoho's six divisions of Greater Data Science shows that computing with data (GDS3) has been substantially automated, while data gathering, preparation, and exploration (GDS1) and science about data science (GDS6) still require essential human input. The educational implication is that data science curricula should focus on this human core while teaching students how to contribute effectively within iterative prompt-output-prompt cycles using retrieval-augmented generation, and that learning outcomes and assessments should explicitly evaluate reasoning and judgment.
翻译:生成式人工智能(GAI)揭示了数据科学中心不可简化的人类核心:GAI的进步应强化而非削弱数据科学教育中对人类推理的关注。当前,GAI能执行许多常规数据科学工作流,包括数据清洗、汇总、可视化、建模及报告撰写。然而,最重要的能力仍然不可简化地归属于人类:问题构建、测量与设计、因果识别、统计与计算推理、伦理与责任,以及意义建构。本文基于Donoho的"大资料科学"框架、Nolan与Temple Lang的计算素养理念,以及McLuhan-Culkin的"我们塑造工具,而后工具塑造我们"的洞见,追溯数据科学通过三条汇聚脉络的演进过程:Tukey将数据分析视为科学的知识愿景、监控资本主义为创造数据科学家工业需求而形成的商业逻辑,以及随之出现的学术项目。将GAI的影响映射至Donoho"大资料科学"的六个分支后发现,数据计算(GDS3)已实现大规模自动化,而数据收集、整理与探索(GDS1)及数据科学科学(GDS6)仍需关键的人类参与。教育启示在于:数据科学课程应聚焦这一人类核心,同时教导学生如何通过检索增强生成在迭代的提示-输出-提示循环中有效贡献,且学习成果与评估应明确指向推理与判断能力。