The growing emphasis on 21st-century competencies in postsecondary education, intensified by the transformative impact of generative AI, underscores the need to evaluate how these competencies are embedded in curricula and how effectively academic programs align with evolving workforce and societal demands. Curricular Analytics, particularly recent generative AI-powered approaches, offer a promising data-driven pathway. However, analyzing 21st-century competencies requires pedagogical reasoning beyond surface-level information retrieval, and the capabilities of large language models in this context remain underexplored. In this study, we extend prior curricular analytics research by examining a broader range of curriculum documents, competency frameworks, and models. Using 7,600 manually annotated curriculum-competency alignment scores, we assess the informativeness of different curriculum sources, benchmark general-purpose LLMs for curriculum-to-competency mapping, and analyze error patterns. We further introduce a reasoning-based prompting strategy, Curricular CoT, to strengthen LLMs' pedagogical reasoning. Our results show that detailed instructional activity descriptions are the most informative type of curriculum document for competency analytics. Open-weight LLMs achieve accuracy comparable to proprietary models on coarse-grained tasks, demonstrating their scalability and cost-effectiveness for institutional use. However, no model reaches human-level precision in fine-grained pedagogical reasoning. Our proposed Curricular CoT yields modest improvements by reducing bias in instructional keyword inference and improving the detection of nuanced pedagogical evidence in long text. Together, these findings highlight the untapped potential of institutional curriculum documents and provide an empirical foundation for advancing AI-driven curricular analytics.
翻译:随着生成式人工智能变革性影响的加剧,高等教育中对21世纪能力的日益重视,凸显了评估这些能力如何融入课程体系以及学术项目如何有效适应不断变化的劳动力与社会需求的必要性。课程分析学,特别是近期基于生成式人工智能的方法,提供了一条前景广阔的数据驱动路径。然而,分析21世纪能力需要超越表层信息检索的教学推理能力,而大型语言模型在此背景下的能力仍未得到充分探索。本研究通过考察更广泛的课程文档、能力框架和模型,扩展了先前的课程分析学研究。利用7,600个人工标注的课程-能力匹配度分数,我们评估了不同课程来源的信息价值,为课程到能力的映射任务对通用大语言模型进行性能基准测试,并分析了错误模式。我们进一步引入了一种基于推理的提示策略——课程思维链,以增强大语言模型的教学推理能力。研究结果表明,详细的教学活动描述是进行能力分析时信息量最丰富的课程文档类型。开源大语言模型在粗粒度任务上达到了与专有模型相当的准确率,证明了其在机构应用中的可扩展性和成本效益。然而,在细粒度的教学推理任务上,没有任何模型能达到人类水平的精确度。我们提出的课程思维链通过减少教学关键词推断中的偏差,并提升对长文本中细微教学证据的检测能力,带来了适度的性能提升。这些发现共同揭示了机构课程文档尚未开发的潜力,并为推进人工智能驱动的课程分析学提供了实证基础。