The development of data science expertise requires tacit, process-oriented skills that are difficult to teach directly. This study addresses the resulting challenge of empirically understanding how the problem-solving processes of experts and novices differ. We apply a multi-level sequence analysis to 440 Jupyter notebooks from a public dataset, mapping low-level coding actions to higher-level problem-solving practices. Our findings reveal that experts do not follow fundamentally different transitions between data science phases than novices (e.g., Data Import, EDA, Model Training, Visualization). Instead, expertise is distinguished by the overall workflow structure from a problem-solving perspective and cell-level, fine-grained action patterns. Novices tend to follow long, linear processes, whereas experts employ shorter, more iterative strategies enacted through efficient, context-specific action sequences. These results provide data science educators with empirical insights for curriculum design and assessment, shifting the focus from final products toward the development of the flexible, iterative thinking that defines expertise-a priority in a field increasingly shaped by AI tools.
翻译:数据科学专业知识的培养需要难以直接传授的隐性、过程导向技能。本研究针对由此产生的实证理解专家与新手问题解决过程差异的挑战,对来自公共数据集的440个Jupyter笔记本进行多层次序列分析,将低层级编码操作映射到高层级问题解决实践。研究发现,在数据科学各阶段(如数据导入、探索性数据分析、模型训练、可视化)之间的转换模式上,专家并未采用与新手根本不同的方式。相反,专业能力的差异主要体现在从问题解决视角观察的整体工作流结构,以及单元格层级的细粒度操作模式。新手倾向于遵循冗长线性的过程,而专家则通过高效、情境特定的操作序列实施更简短、更具迭代性的策略。这些结果为数据科学教育者提供了课程设计与评估的实证依据,促使教学重点从最终产出转向培养定义专业能力的灵活迭代思维——这一优先事项在日益受AI工具影响的领域中尤为重要。