Access Timing as Scaffolding: A Reinforcement Learning Approach to GenAI in Education

In recent years, generative AI (GenAI) in educational settings has become ubiquitous in students' daily lives, despite its potential to induce over-reliance, metacognitive disengagement, and diminished learning when used unrestrictedly. While most prior research has thus focused on how to pedagogically scaffold its usage, the question of when to allow off-the-shelf GenAI remains understudied and lacks pedagogically grounded empirical investigation. We treat access timing itself as a form of implicit scaffolding and operationalize it through a reinforcement learning (RL) agent that decides when students should access GenAI, with a reward function grounded in metacognitive theory, cognitive load theory, and productive failure. In a mixed-methods controlled lab study with N=105 participants, we compared the agent's effect on learning gains and metacognitive engagement to unrestricted and fully restricted use. Results show that strategically timed GenAI access under the reinforcement learning condition improved objective post-test performance and metacognitive accuracy compared with unrestricted access, while reducing task errors and time on task relative to complete withholding, all without the need for explicit metacognitive prompts or structured scaffolding. However, no between-condition differences emerged on self-reported metacognitive awareness. Overall, timing of GenAI access therefore is a tractable, theoretically grounded, and scalable pedagogical paradigm that improves over completely unrestricted and withheld access, compatible with off-the-shelf tools and potentially low adoption barrier. This opens up a new research area that explores how access timing can be facilitated by educators and implemented in human-AI learning system design.

翻译：近年来，生成式人工智能（GenAI）在教育环境中的应用已深入学生日常生活，但无限制使用时可能引发过度依赖、元认知脱离及学习效果下降等问题。已有研究多聚焦于如何在教学上对GenAI使用进行支架式引导，然而对于何时允许学生自主调用现成GenAI这一关键问题，仍缺乏基于教学理论的实证探究。本文将访问时机本身视为一种隐性支架，并通过强化学习（RL）智能体将其操作化——该智能体基于元认知理论、认知负荷理论及有效失败理论设计的奖励函数，决策学生何时可以访问GenAI。我们开展了一项包含105名参与者的混合方法对照实验室研究，将智能体对学习收益和元认知参与的影响与完全限制及完全开放使用两种条件进行比较。结果表明：相较于完全开放使用，强化学习条件下战略性调度GenAI访问时机能提升客观后测成绩和元认知准确性；相较于完全禁止访问，则能减少任务错误并缩短任务时间，且无需显性元认知提示或结构化支架。然而，在自我报告的元认知意识方面未发现组间差异。综上，GenAI访问时机可作为一种可操作、有理论根基且可扩展的教学范式，其效果优于完全开放与完全禁止两种极端方案，兼容现成工具且具有较低采纳门槛。这为探索教育者如何促进访问时机调控、设计人机协同学习系统开辟了新的研究方向。