Learning to Play Blackjack: A Curriculum Learning Perspective

Reinforcement Learning (RL) agents often struggle with efficiency and performance in complex environments. We propose a novel framework that uses a Large Language Model (LLM) to dynamically generate a curriculum over available actions, enabling the agent to incorporate each action individually. We apply this framework to the game of Blackjack, where the LLM creates a multi-stage training path that progressively introduces complex actions to a Tabular Q-Learning and a Deep Q-Network (DQN) agent. Our evaluation in a realistic 8-deck simulation over 10 independent runs demonstrates significant performance gains over standard training methods. The curriculum-based approach increases the DQN agent's average win rate from 43.97% to 47.41%, reduces the average bust rate from 32.9% to 28.0%, and accelerates the overall workflow by over 74%, with the agent's full training completing faster than the baseline's evaluation phase alone. These results validate that LLM-guided curricula can build more effective, robust, and efficient RL agents.

翻译：强化学习（RL）智能体在复杂环境中常面临效率与性能的挑战。我们提出了一种新颖框架，利用大语言模型（LLM）动态生成可用动作的课程体系，使智能体能够逐步掌握每个动作。我们将该框架应用于二十一点游戏，通过LLM构建多阶段训练路径，逐步向表格型Q学习和深度Q网络（DQN）智能体引入复杂动作。在基于10次独立运行的8副牌真实模拟评估中，该方法相比标准训练方法展现出显著性能提升。课程学习方法使DQN智能体的平均胜率从43.97%提升至47.41%，平均爆牌率从32.9%降至28.0%，并将整体工作流程加速超过74%（智能体完整训练时间少于基准方法的单次评估阶段）。这些结果验证了LLM引导的课程体系能够构建更高效、更鲁棒的强化学习智能体。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

大语言模型智能体强化学习：全景综述

专知会员服务

50+阅读 · 2025年12月18日

《单智能体与多智能体深度强化学习方法的优化研究》219页

专知会员服务

53+阅读 · 2025年4月5日

《改进单智能体和多智能体深度强化学习方法》219页

专知会员服务

64+阅读 · 2025年2月14日

【新书】面向金融的强化学习：基于Python的入门介绍，268页pdf

专知会员服务

40+阅读 · 2024年10月27日