Generative AI (GenAI) is increasingly used as a knowledge partner in higher education, raising the need for instructional designs that emphasize AI literacy practices such as evaluating output credibility and maintaining human accountability. Existing AI literacy frameworks focus more on what learners should do than on how these practices are enacted in routine student-GenAI collaboration. We address this gap by framing student-GenAI interaction as a transactive memory partnership, where credibility regulates reliance and verification. To make this process visible during coursework, we used a weaker large language model (LLM): small enough to run on most students' computers during class, helpful enough to support learning, but not so capable that it removes the need for verification. In an undergraduate STEM course, students were randomly assigned to one of three conditions across repeated activities: reflection-first (think first, then consult AI), verification-required (use AI, then evaluate the output), or control (unrestricted use). Students completed a transactive memory survey at three time points (N = 42). Weighted credibility diverged by condition over time. ANCOVA controlling for baseline credibility showed a condition effect at mid-semester, F(2, 38) = 4.02, p = .026, partial eta squared = .175, and a stronger effect at post-intervention, F(2, 38) = 5.48, p = .008, partial eta squared = .224; adjusted means were lowest in reflection-first, intermediate in verification-required, and highest in control. Parallel analyses of specialization and coordination were not significant. These findings suggest that workflow sequencing, deliberate use of weaker LLMs, and accountability cues embedded in assignment instructions can recalibrate students' credibility judgments in GenAI use, with reflection-first producing the strongest downward shift in reliance.
翻译:生成式人工智能(GenAI)在高等教育中日益被用作知识伙伴,这促使需要强调人工智能素养实践的教学设计,例如评估输出可信度和维护人类问责制。现有的人工智能素养框架更侧重于学习者应做什么,而非这些实践如何在日常学生与GenAI的协作中具体实现。我们通过将学生与GenAI的互动框架化为一种跨活跃记忆伙伴关系来填补这一空白,其中可信度调节依赖与验证。为使这一过程在课程作业中可见,我们使用了一个弱势大语言模型(LLM):其规模足以在课堂上运行于大多数学生电脑上,足以支持学习,但能力不足以消除验证需求。在一门本科STEM课程中,学生被随机分配到三种条件之一,参与重复活动:反思优先(先思考,再咨询AI)、验证必需(使用AI,再评估输出)、或对照组(无限制使用)。学生在三个时间点完成了跨活跃记忆调查(N=42)。加权可信度随时间按条件分化。控制基线可信度的ANCOVA显示,学期中期存在条件效应,F(2, 38)=4.02,p=.026,偏η²=.175,干预后效应更强,F(2, 38)=5.48,p=.008,偏η²=.224;调整后的均值在反思优先中最低,验证必需中居中,对照组最高。对专门化和协调的平行分析不显著。这些发现表明,工作流序列、对弱势大语言模型的刻意使用以及嵌入作业指令中的问责线索可以重新校准学生在GenAI使用中的可信度判断,其中反思优先产生了最强的依赖减弱转移。