交互式大语言模型辅助的课程学习用于多任务进化策略搜索 (Interactive LLM-assisted Curriculum Learning for Multi-Task Evolutionary Policy Search)

Multi-task policy search is a challenging problem because policies are required to generalize beyond training cases. Curriculum learning has proven to be effective in this setting, as it introduces complexity progressively. However, designing effective curricula is labor-intensive and requires extensive domain expertise. LLM-based curriculum generation has only recently emerged as a potential solution, but was limited to operate in static, offline modes without leveraging real-time feedback from the optimizer. Here we propose an interactive LLM-assisted framework for online curriculum generation, where the LLM adaptively designs training cases based on real-time feedback from the evolutionary optimization process. We investigate how different feedback modalities, ranging from numeric metrics alone to combinations with plots and behavior visualizations, influence the LLM ability to generate meaningful curricula. Through a 2D robot navigation case study, tackled with genetic programming as optimizer, we evaluate our approach against static LLM-generated curricula and expert-designed baselines. We show that interactive curriculum generation outperforms static approaches, with multimodal feedback incorporating both progression plots and behavior visualizations yielding performance competitive with expert-designed curricula. This work contributes to understanding how LLMs can serve as interactive curriculum designers for embodied AI systems, with potential extensions to broader evolutionary robotics applications.

翻译：多任务策略搜索是一个具有挑战性的问题，因为策略需要泛化到训练案例之外。课程学习已被证明在此场景下是有效的，因为它逐步引入复杂性。然而，设计有效的课程是劳动密集型的，并且需要广泛的领域专业知识。基于大语言模型的课程生成直到最近才作为一种潜在的解决方案出现，但其仅限于在静态、离线模式下运行，未能利用来自优化器的实时反馈。在此，我们提出了一个用于在线课程生成的交互式大语言模型辅助框架，其中大语言模型根据进化优化过程的实时反馈自适应地设计训练案例。我们研究了不同的反馈模态——从仅使用数值指标到结合图表和行为可视化——如何影响大语言模型生成有意义课程的能力。通过一个以遗传编程作为优化器的二维机器人导航案例研究，我们将我们的方法与静态大语言模型生成的课程以及专家设计的基线进行了比较。我们证明，交互式课程生成优于静态方法，其中结合了进度图和行为可视化的多模态反馈所产生的性能可与专家设计的课程相媲美。这项工作有助于理解大语言模型如何作为具身人工智能系统的交互式课程设计者，并具有扩展到更广泛的进化机器人应用中的潜力。

相关内容

课程

关注 6

课程是指学校学生所应学习的学科总和及其进程与安排。课程是对教育的目标、教学内容、教学活动方式的规划和设计，是教学计划、教学大纲等诸多方面实施过程的总和。广义的课程是指学校为实现培养目标而选择的教育内容及其进程的总和，它包括学校老师所教授的各门学科和有目的、有计划的教育活动。狭义的课程是指某一门学科。专知上对国内外最新AI+X的课程进行了收集与索引，涵盖斯坦福大学、CMU、MIT、清华、北大等名校开放课程。

强化学习遇见大语言模型：贯穿 LLM 生命周期的进展与应用综述

专知会员服务

37+阅读 · 2025年9月23日

带入您自己的知识：大型语言模型（LLM）知识扩展方法综述

专知会员服务

38+阅读 · 2025年2月21日

《多模态大型语言模型》最新进展，详述26种现有MM-LLMs

专知会员服务

65+阅读 · 2024年1月25日

RAG+LLM=？同济大学等最新《大型语言模型的检索增强生成》综述

专知会员服务

110+阅读 · 2023年12月19日