Creating systems capable of generating virtually infinite variations of complex and novel behaviour without predetermined goals or limits is a major challenge in the field of AI. This challenge has been addressed through the development of several open-ended algorithms that can continuously generate new and diverse behaviours, such as the POET and Enhanced-POET algorithms for co-evolving environments and agent behaviour. One of the challenges with existing methods however, is that they struggle to continuously generate complex environments. In this work, we propose LLM-POET, a modification of the POET algorithm where the environment is both created and mutated using a Large Language Model (LLM). By fine-tuning a LLM with text representations of Evolution Gym environments and captions that describe the environment, we were able to generate complex and diverse environments using natural language. We found that not only could the LLM produce a diverse range of environments, but compared to the CPPNs used in Enhanced-POET for environment generation, the LLM allowed for a 34% increase in the performance gain of co-evolution. This increased performance suggests that the agents were able to learn a more diverse set of skills by training on more complex environments.
翻译:在人工智能领域,如何创建能够生成几乎无限复杂新颖行为变体、且不依赖预定目标或限制的系统,是一项重大挑战。已有研究通过开发多种开放式算法来应对这一挑战,这些算法能够持续生成新颖多样的行为,例如用于协同演化环境与智能体行为的POET和Enhanced-POET算法。然而,现有方法面临的一个难题是难以持续生成复杂环境。本研究提出LLM-POET——一种改进的POET算法,其环境创建与突变均通过大型语言模型实现。通过使用Evolution Gym环境的文本表征及描述环境的说明文字对LLM进行微调,我们能够以自然语言生成复杂多样的环境。研究发现,LLM不仅能生成多样化的环境,且相较于Enhanced-POET中用于环境生成的CPPN,LLM使协同演化的性能增益提升了34%。这一性能提升表明,通过在更复杂的环境中进行训练,智能体能够习得更多样化的技能集。