In this study, we address the issue of enabling an artificial intelligence agent to execute complex language instructions within virtual environments. In our framework, we assume that these instructions involve intricate linguistic structures and multiple interdependent tasks that must be navigated successfully to achieve the desired outcomes. To effectively manage these complexities, we propose a hierarchical framework that combines the deep language comprehension of large language models with the adaptive action-execution capabilities of reinforcement learning agents. The language module (based on LLM) translates the language instruction into a high-level action plan, which is then executed by a pre-trained reinforcement learning agent. We have demonstrated the effectiveness of our approach in two different environments: in IGLU, where agents are instructed to build structures, and in Crafter, where agents perform tasks and interact with objects in the surrounding environment according to language commands.
翻译:本研究旨在解决人工智能代理在虚拟环境中执行复杂语言指令的问题。在我们的框架中,我们假设这些指令涉及复杂的语言结构和多个相互依赖的任务,必须成功导航这些任务才能实现预期结果。为有效管理这些复杂性,我们提出了一种分层框架,该框架将大型语言模型的深度语言理解能力与强化学习代理的自适应动作执行能力相结合。语言模块(基于LLM)将语言指令转化为高层动作计划,随后由预训练的强化学习代理执行。我们在两种不同环境中验证了所提方法的有效性:在IGLU环境中,代理被指令构建结构;在Crafter环境中,代理根据语言命令执行任务并与周围环境中的对象进行交互。