We adapt Parameterized Environment Response Model (PERM), a method for training both Reinforcement Learning (RL) Agents and human learners in parameterized environments by directly modeling difficulty and ability. Inspired by Item Response Theory (IRT), PERM aligns environment difficulty with individual ability, creating a Zone of Proximal Development-based curriculum. Remarkably, PERM operates without real-time RL updates and allows for offline training, ensuring its adaptability across diverse students. We present a two-stage training process that capitalizes on PERM's adaptability, and demonstrate its effectiveness in training RL agents and humans in an empirical study.
翻译:我们改进了参数化环境响应模型(PERM),一种通过直接建模难度与能力来训练强化学习智能体与人类学习者的方法。受项目反应理论(IRT)启发,PERM将环境难度与个体能力对齐,构建基于最近发展区的课程学习方案。值得注意的是,PERM无需实时强化学习更新即可运行,并支持离线训练,确保其适用于各类学习对象。我们提出了一个利用PERM自适应性的两阶段训练流程,并通过实证研究验证了其在训练强化学习智能体与人类方面的有效性。