Generative AI as a Design Variable: An Evidence-Centered Framework for Principled Governance in STEM Assessment

Generative Artificial Intelligence (GenAI) presents a governance challenge for STEM assessment. Unrestricted GenAI access enables task outsourcing that undermines the validity of traditional assessments; blanket prohibitions are difficult to enforce, may push use underground, and do little to prepare students for workplaces where GenAI-supported workflows are increasingly common. This paper addresses this dilemma by proposing a framework grounded in Evidence-Centered Design (ECD) that treats GenAI as a design variable within the assessment argument rather than an external threat to it. The framework analyzes how GenAI reshapes the student model, evidence model, and task model, and uses this analysis to articulate three principled governance stances. Restrict is warranted when GenAI would contaminate the inferential link between student work products and targeted unaided proficiency. Scaffold is warranted when bounded GenAI support can support peripheral demands without revealing the target construct, preserving inferential interpretability. Require is warranted when the target construct is disciplinary AI interaction competency and tasks can be designed to elicit process artifacts, including prompts, critiques, and revisions, that make student reasoning observable, scorable, and distinguishable from AI-generated output. This framework specifies when to restrict, scaffold, or require GenAI use in STEM assessment. We present two task designs deployed in an introductory physics course and demonstrate that disciplinary AI interaction competencies are observable in student response artifacts and can be scored using defensible rubrics grounded in student data and expert knowledge. By situating GenAI governance within validity arguments, the framework offers actionable guidance for preserving learning integrity while supporting authentic preparation for AI-enabled professional environments.

翻译：生成式人工智能（GenAI）对STEM评估提出了治理挑战。无限制的GenAI访问会导致任务外包，削弱传统评估的有效性；而全面禁止不仅难以执行，可能迫使使用行为转入地下，且无助于学生适应GenAI支持的工作流程日益普及的职场环境。本文通过提出一个基于证据中心设计（ECD）的框架来应对这一困境，该框架将GenAI视为评估论证中的设计变量，而非外部威胁。该框架分析了GenAI如何重塑学生模型、证据模型和任务模型，并依据此分析阐述了三种原则性治理立场：当GenAI会污染学生作品与目标无辅助能力之间的推理性联系时，应实施限制；当有限度的GenAI支持可辅助外围需求而不揭示目标构念、保留推理解释性时，应实施支架式支持；当目标构念为学科性AI交互能力，且可设计任务引发过程产物（包括提示、批判和修订），使学生的推理过程可观察、可评分、并区别于AI生成输出时，应要求使用。该框架明确了在STEM评估中何时应限制、支架或要求使用GenAI。我们展示了在物理导论课程中部署的两种任务设计，并证明学科性AI交互能力可在学生反应产物中观察，并可通过基于学生数据和专家知识构建的可辩护评分标准进行评分。通过将GenAI治理置于效度论证框架内，本框架为在支持真实AI赋能专业环境准备的同时保持学习完整性提供了可行指导。