Constructing a Q-matrix is a critical but labor-intensive step in cognitive diagnostic modeling (CDM). This study investigates whether AI tools (i.e., general language models) can support Q-matrix development by comparing AI-generated Q-matrices with a validated Q-matrix from Li and Suen (2013) for a reading comprehension test. In May 2025, multiple AI models were provided with the same training materials as human experts. Agreement among AI-generated Q-matrices, the validated Q-matrix, and human raters' Q-matrices was assessed using Cohen's kappa. Results showed substantial variation across AI models, with Google Gemini 2.5 Pro achieving the highest agreement (Kappa = 0.63) with the validated Q-matrix, exceeding that of all human experts. A follow-up analysis in January 2026 using newer AI versions, however, revealed lower agreement with the validated Q-matrix. Implications and directions for future research are discussed.
翻译:在认知诊断建模(CDM)中,构建Q矩阵是一个关键但劳动密集的步骤。本研究通过比较人工智能生成的Q矩阵与Li和Suen(2013)针对一项阅读理解测试所验证的Q矩阵,探讨了人工智能工具(即通用语言模型)是否能支持Q矩阵的开发。2025年5月,多个AI模型获得了与人类专家相同的训练材料。使用Cohen's kappa评估了AI生成的Q矩阵、已验证的Q矩阵以及人类评分者Q矩阵之间的一致性。结果显示,不同AI模型之间存在显著差异,其中Google Gemini 2.5 Pro与已验证Q矩阵的一致性最高(Kappa = 0.63),超过了所有人类专家。然而,2026年1月使用更新版本AI进行的后续分析显示,其与已验证Q矩阵的一致性有所降低。本文讨论了相关启示及未来研究方向。