利用大型语言模型评估高等教育课程中的21世纪能力：性能基准测试与基于推理的提示策略 (Evaluating 21st-Century Competencies in Postsecondary Curricula with Large Language Models: Performance Benchmarking and Reasoning-Based Prompting Strategies)

Evaluating 21st-Century Competencies in Postsecondary Curricula with Large Language Models: Performance Benchmarking and Reasoning-Based Prompting Strategies

翻译：利用大型语言模型评估高等教育课程中的21世纪能力：性能基准测试与基于推理的提示策略

Zhen Xu,Xin Guan,Chenxi Shi,Qinhao Chen,Renzhe Yu

The growing emphasis on 21st-century competencies in postsecondary education, intensified by the transformative impact of generative AI, underscores the need to evaluate how these competencies are embedded in curricula and how effectively academic programs align with evolving workforce and societal demands. Curricular Analytics, particularly recent generative AI-powered approaches, offer a promising data-driven pathway. However, analyzing 21st-century competencies requires pedagogical reasoning beyond surface-level information retrieval, and the capabilities of large language models in this context remain underexplored. In this study, we extend prior curricular analytics research by examining a broader range of curriculum documents, competency frameworks, and models. Using 7,600 manually annotated curriculum-competency alignment scores, we assess the informativeness of different curriculum sources, benchmark general-purpose LLMs for curriculum-to-competency mapping, and analyze error patterns. We further introduce a reasoning-based prompting strategy, Curricular CoT, to strengthen LLMs' pedagogical reasoning. Our results show that detailed instructional activity descriptions are the most informative type of curriculum document for competency analytics. Open-weight LLMs achieve accuracy comparable to proprietary models on coarse-grained tasks, demonstrating their scalability and cost-effectiveness for institutional use. However, no model reaches human-level precision in fine-grained pedagogical reasoning. Our proposed Curricular CoT yields modest improvements by reducing bias in instructional keyword inference and improving the detection of nuanced pedagogical evidence in long text. Together, these findings highlight the untapped potential of institutional curriculum documents and provide an empirical foundation for advancing AI-driven curricular analytics.

翻译：随着生成式人工智能变革性影响的加剧，高等教育中对21世纪能力的日益重视，凸显了评估这些能力如何融入课程体系以及学术项目如何有效适应不断变化的劳动力与社会需求的必要性。课程分析学，特别是近期基于生成式人工智能的方法，提供了一条前景广阔的数据驱动路径。然而，分析21世纪能力需要超越表层信息检索的教学推理能力，而大型语言模型在此背景下的能力仍未得到充分探索。本研究通过考察更广泛的课程文档、能力框架和模型，扩展了先前的课程分析学研究。利用7,600个人工标注的课程-能力匹配度分数，我们评估了不同课程来源的信息价值，为课程到能力的映射任务对通用大语言模型进行性能基准测试，并分析了错误模式。我们进一步引入了一种基于推理的提示策略——课程思维链，以增强大语言模型的教学推理能力。研究结果表明，详细的教学活动描述是进行能力分析时信息量最丰富的课程文档类型。开源大语言模型在粗粒度任务上达到了与专有模型相当的准确率，证明了其在机构应用中的可扩展性和成本效益。然而，在细粒度的教学推理任务上，没有任何模型能达到人类水平的精确度。我们提出的课程思维链通过减少教学关键词推断中的偏差，并提升对长文本中细微教学证据的检测能力，带来了适度的性能提升。这些发现共同揭示了机构课程文档尚未开发的潜力，并为推进人工智能驱动的课程分析学提供了实证基础。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

基于大语言模型智能体的社会认知模拟

专知会员服务

13+阅读 · 2月22日

LLMS4ALL：大语言模型在各学科科研与应用中的综述

专知会员服务

36+阅读 · 2025年10月4日

大语言模型在规划与调度问题上的应用

专知会员服务

51+阅读 · 2025年1月12日

【新书】大型语言模型实战：语言理解与生成

专知会员服务

67+阅读 · 2024年11月6日