There has been a growing interest in developing learner models to enhance learning and teaching experiences in educational environments. However, existing works have primarily focused on structured environments relying on meticulously crafted representations of tasks, thereby limiting the agent's ability to generalize skills across tasks. In this paper, we aim to enhance the generalization capabilities of agents in open-ended text-based learning environments by integrating Reinforcement Learning (RL) with Large Language Models (LLMs). We investigate three types of agents: (i) RL-based agents that utilize natural language for state and action representations to find the best interaction strategy, (ii) LLM-based agents that leverage the model's general knowledge and reasoning through prompting, and (iii) hybrid LLM-assisted RL agents that combine these two strategies to improve agents' performance and generalization. To support the development and evaluation of these agents, we introduce PharmaSimText, a novel benchmark derived from the PharmaSim virtual pharmacy environment designed for practicing diagnostic conversations. Our results show that RL-based agents excel in task completion but lack in asking quality diagnostic questions. In contrast, LLM-based agents perform better in asking diagnostic questions but fall short of completing the task. Finally, hybrid LLM-assisted RL agents enable us to overcome these limitations, highlighting the potential of combining RL and LLMs to develop high-performing agents for open-ended learning environments.
翻译:近年来,教育领域愈发重视开发学习模型以提升教与学体验。然而,现有研究主要聚焦于依赖精细任务表征的结构化环境,这限制了智能体在跨任务间泛化能力。本文旨在通过融合强化学习与大语言模型,增强开放文本式学习环境中的智能体泛化能力。我们研究了三种智能体类型:(i)基于强化学习的智能体,利用自然语言进行状态与动作表征以寻找最优交互策略;(ii)基于大语言模型的智能体,通过提示机制调用模型通用知识与推理能力;(iii)混合式大语言模型辅助强化学习智能体,结合两种策略提升性能与泛化能力。为支持这些智能体的开发与评估,我们提出了PharmaSimText——源自虚拟药房环境PharmaSim的新型基准测试集,专为诊断对话实践而设计。实验结果表明:基于强化学习的智能体在任务完成度上表现卓越,但诊断性提问质量不足;而基于大语言模型的智能体在诊断性提问方面更优,却难以完成完整任务。最终,混合式大语言模型辅助强化学习智能体成功突破了这些局限,彰显了融合强化学习与大语言模型以开发高性能开放学习环境智能体的潜力。