Student Assessment in Cybersecurity Training Automated by Pattern Mining and Clustering

Hands-on cybersecurity training allows students and professionals to practice various tools and improve their technical skills. The training occurs in an interactive learning environment that enables completing sophisticated tasks in full-fledged operating systems, networks, and applications. During the training, the learning environment allows collecting data about trainees' interactions with the environment, such as their usage of command-line tools. These data contain patterns indicative of trainees' learning processes, and revealing them allows to assess the trainees and provide feedback to help them learn. However, automated analysis of these data is challenging. The training tasks feature complex problem-solving, and many different solution approaches are possible. Moreover, the trainees generate vast amounts of interaction data. This paper explores a dataset from 18 cybersecurity training sessions using data mining and machine learning techniques. We employed pattern mining and clustering to analyze 8834 commands collected from 113 trainees, revealing their typical behavior, mistakes, solution strategies, and difficult training stages. Pattern mining proved suitable in capturing timing information and tool usage frequency. Clustering underlined that many trainees often face the same issues, which can be addressed by targeted scaffolding. Our results show that data mining methods are suitable for analyzing cybersecurity training data. Educational researchers and practitioners can apply these methods in their contexts to assess trainees, support them, and improve the training design. Artifacts associated with this research are publicly available.

翻译：实践型网络安全培训使学生和专业人士能够练习各类工具并提升技术技能。此类培训在交互式学习环境中进行，使学习者能够在完整的操作系统、网络和应用程序中完成复杂任务。培训期间，学习环境可收集受训者的交互数据（如命令行工具的使用情况）。这些数据蕴含反映受训者学习过程的模式，揭示这些模式有助于评估受训者并提供反馈以促进其学习。然而，对这些数据进行自动化分析颇具挑战：培训任务涉及复杂问题求解，存在多种可能的解决方案路径，且受训者会产生海量交互数据。本文采用数据挖掘与机器学习技术，对来自18次网络安全培训课程的数据集进行探索。我们运用模式挖掘与聚类方法，分析了113名受训者收集的8834条命令数据，揭示了他们的典型行为模式、常见错误、解决策略及困难训练阶段。研究发现，模式挖掘能有效捕捉时序信息和工具使用频率；聚类分析则凸显许多受训者常面临相同问题，可通过针对性支架式教学加以解决。结果表明，数据挖掘方法适用于分析网络安全培训数据。教育研究者与实践者可将这些方法应用于自身场景，以评估受训者、提供支持并改进培训设计。本研究相关成果已公开共享。