Digital learning platforms are increasingly used to support reading development while generating rich log files and item-level textual content. Using these data, this study proposes a dynamic cognitive diagnostic modelling (CDM) framework that incorporates text-derived semantic information to inform the estimation of the Q-matrix. We construct item-level semantic representations of question text and response options, and use these representations to define an informative prior on the Q-matrix. This approach treats text-derived signals as proxies for item complexity and cognitive demands, guiding the item-skill mapping in a data-driven manner. The proposed framework jointly estimates latent skill mastery profiles, item parameters, and transition dynamics over time within a Bayesian framework. We apply the model to data from Boost Reading, a digital reading supplement, focusing on students' vocabulary and comprehension skill development. We compare the proposed framework with a baseline model without any text information and show that the text-derived prior can improve Q-matrix recovery, particularly in settings where response data alone provide limited identification, as well as other model parameters for varying scenarios. This study provides a novel integration of natural language processing and dynamic CDMs, offering a data-driven approach to modelling skill acquisition and item-skill relationships in digital learning environments.
翻译:数字学习平台越来越多地用于支持阅读发展,同时生成丰富的日志文件和题目级别的文本内容。利用这些数据,本研究提出了一种动态认知诊断建模框架,该框架融合从文本中提取的语义信息来指导Q矩阵的估计。我们构建了题目文本和选项的题目级语义表征,并利用这些表征定义了Q矩阵的信息性先验分布。该方法将文本派生信号视为题目复杂性和认知需求的代理,以数据驱动的方式引导题目-技能映射。该框架在贝叶斯框架内联合估计了潜在技能掌握概况、题目参数以及随时间变化的转换动态。我们将该模型应用于数字阅读辅助工具Boost Reading的数据,重点关注学生的词汇和阅读理解技能发展。我们将所提框架与不包含任何文本信息的基线模型进行比较,结果表明,文本派生先验信息能够改善Q矩阵的恢复,特别是在仅凭响应数据无法充分识别的情况下,还能改进不同场景下的其他模型参数。本研究提供了自然语言处理与动态认知诊断模型的新颖整合,为数字学习环境中建模技能获取和题目-技能关系提供了一种数据驱动的方法。