Personalized Learning Path Planning (PLPP) aims to design adaptive learning paths that align with individual goals. While large language models (LLMs) show potential in personalizing learning experiences, existing approaches often lack mechanisms for goal-aligned planning. We introduce Pxplore, a novel framework for PLPP that integrates a reinforcement-based training paradigm and an LLM-driven educational architecture. We design a structured learner state model and an automated reward function that transforms abstract objectives into computable signals. We train the policy combining supervised fine-tuning (SFT) and Group Relative Policy Optimization (GRPO), and deploy it within a real-world learning platform. Extensive experiments validate Pxplore's effectiveness in producing coherent, personalized, and goal-driven learning paths. We release our code and dataset at https://github.com/Pxplore/pxplore-algo.
翻译:个性化学习路径规划(PLPP)旨在设计符合个体目标的适应性学习路径。尽管大语言模型(LLMs)在个性化学习体验方面展现出潜力,但现有方法通常缺乏与目标对齐的规划机制。本文提出Pxplore,一种用于PLPP的新型框架,它整合了基于强化的训练范式与LLM驱动的教育架构。我们设计了一个结构化的学习者状态模型和一个自动化奖励函数,该函数将抽象目标转化为可计算的信号。我们结合监督微调(SFT)和组相对策略优化(GRPO)训练策略,并将其部署在一个真实的学习平台中。大量实验验证了Pxplore在生成连贯、个性化且目标驱动的学习路径方面的有效性。我们在https://github.com/Pxplore/pxplore-algo发布了代码和数据集。