Quality-Diversity is a branch of stochastic optimization that is often applied to problems from the Reinforcement Learning and control domains in order to construct repertoires of well-performing policies/skills that exhibit diversity with respect to a behavior space. Such archives are usually composed of a finite number of reactive agents which are each associated to a unique behavior descriptor, and instantiating behavior descriptors outside of that coarsely discretized space is not straight-forward. While a few recent works suggest solutions to that issue, the trajectory that is generated is not easily customizable beyond the specification of a target behavior descriptor. We propose to jointly solve those problems in environments where semantic information about static scene elements is available by leveraging a Large Language Model to augment the repertoire with natural language descriptions of trajectories, and training a policy conditioned on those descriptions. Thus, our method allows a user to not only specify an arbitrary target behavior descriptor, but also provide the model with a high-level textual prompt to shape the generated trajectory. We also propose an LLM-based approach to evaluating the performance of such generative agents. Furthermore, we develop a benchmark based on simulated robot navigation in a 2d maze that we use for experimental validation.
翻译:质量-多样性是随机优化中的一个分支,常被应用于强化学习和控制领域,旨在构建由表现良好且行为空间具有多样性的策略/技能组成的档案库。此类档案通常由有限数量的反应式智能体组成,每个智能体与独特的行为描述符相关联,而实现该粗离散空间之外的行为描述符并非易事。尽管近期有少数研究提出了解决该问题的方案,但生成的轨迹除了指定目标行为描述符外,难以进行便捷的自定义。针对存在静态场景元素语义信息的环境,我们提出利用大语言模型以自然语言描述轨迹来增强档案库,并训练基于这些描述的条件策略,从而联合解决上述问题。因此,我们的方法不仅允许用户指定任意目标行为描述符,还能通过高层文本提示引导模型生成轨迹。此外,我们提出了一种基于大语言模型的评估方法,用于评价此类生成式智能体的性能。同时,我们开发了一个基于二维迷宫模拟机器人导航的基准平台,用于实验验证。