Multi-task policy search is a challenging problem because policies are required to generalize beyond training cases. Curriculum learning has proven to be effective in this setting, as it introduces complexity progressively. However, designing effective curricula is labor-intensive and requires extensive domain expertise. LLM-based curriculum generation has only recently emerged as a potential solution, but was limited to operate in static, offline modes without leveraging real-time feedback from the optimizer. Here we propose an interactive LLM-assisted framework for online curriculum generation, where the LLM adaptively designs training cases based on real-time feedback from the evolutionary optimization process. We investigate how different feedback modalities, ranging from numeric metrics alone to combinations with plots and behavior visualizations, influence the LLM ability to generate meaningful curricula. Through a 2D robot navigation case study, tackled with genetic programming as optimizer, we evaluate our approach against static LLM-generated curricula and expert-designed baselines. We show that interactive curriculum generation outperforms static approaches, with multimodal feedback incorporating both progression plots and behavior visualizations yielding performance competitive with expert-designed curricula. This work contributes to understanding how LLMs can serve as interactive curriculum designers for embodied AI systems, with potential extensions to broader evolutionary robotics applications.
翻译:多任务策略搜索是一个具有挑战性的问题,因为策略需要泛化到训练案例之外。课程学习已被证明在此场景下是有效的,因为它逐步引入复杂性。然而,设计有效的课程是劳动密集型的,并且需要广泛的领域专业知识。基于大语言模型的课程生成直到最近才作为一种潜在的解决方案出现,但其仅限于在静态、离线模式下运行,未能利用来自优化器的实时反馈。在此,我们提出了一个用于在线课程生成的交互式大语言模型辅助框架,其中大语言模型根据进化优化过程的实时反馈自适应地设计训练案例。我们研究了不同的反馈模态——从仅使用数值指标到结合图表和行为可视化——如何影响大语言模型生成有意义课程的能力。通过一个以遗传编程作为优化器的二维机器人导航案例研究,我们将我们的方法与静态大语言模型生成的课程以及专家设计的基线进行了比较。我们证明,交互式课程生成优于静态方法,其中结合了进度图和行为可视化的多模态反馈所产生的性能可与专家设计的课程相媲美。这项工作有助于理解大语言模型如何作为具身人工智能系统的交互式课程设计者,并具有扩展到更广泛的进化机器人应用中的潜力。