Complex Instruction Following with Diverse Style Policies in Football Games

Despite advancements in language-controlled reinforcement learning (LC-RL) for basic domains and straightforward commands (e.g., object manipulation and navigation), effectively extending LC-RL to comprehend and execute high-level or abstract instructions in complex, multi-agent environments, such as football games, remains a significant challenge. To address this gap, we introduce Language-Controlled Diverse Style Policies (LCDSP), a novel LC-RL paradigm specifically designed for complex scenarios. LCDSP comprises two key components: a Diverse Style Training (DST) method and a Style Interpreter (SI). The DST method efficiently trains a single policy capable of exhibiting a wide range of diverse behaviors by modulating agent actions through style parameters (SP). The SI is designed to accurately and rapidly translate high-level language instructions into these corresponding SP. Through extensive experiments in a complex 5v5 football environment, we demonstrate that LCDSP effectively comprehends abstract tactical instructions and accurately executes the desired diverse behavioral styles, showcasing its potential for complex, real-world applications.

翻译：尽管语言控制强化学习（LC-RL）在基础领域和简单指令（如物体操作与导航）方面已取得进展，但将其有效扩展至复杂多智能体环境（例如足球游戏）中以理解并执行高层或抽象指令，仍是一个重大挑战。为应对这一差距，我们提出了语言控制多样化风格策略（LCDSP），这是一种专为复杂场景设计的新型LC-RL范式。LCDSP包含两个关键组件：多样化风格训练（DST）方法和风格解释器（SI）。DST方法通过风格参数（SP）调节智能体行为，高效训练出能够展现广泛多样化行为的单一策略。SI旨在准确、快速地将高层语言指令翻译为相应的SP。通过在复杂的5v5足球环境中进行大量实验，我们证明LCDSP能够有效理解抽象战术指令，并准确执行所需的多样化行为风格，展示了其在复杂现实应用中的潜力。

相关内容

DST (Digital Sky Technologies)

关注 1

DST ( Digital Sky Technologies) 为一家俄罗斯科技、投资公司，创始人为 Yuri Milner。2010 年，DST 将旗下邮件服务和投资职能拆分为 http://Mail.ru Group 和 DST Global 两家公司。 DST 曾投资过 Facebook、Twitter、Groupon、Airbnb、Spotify、Zynga、Flipkart、阿里巴巴、京东等知名科技互联网企业。

面向具身操作的高效视觉–语言–动作模型：系统综述

专知会员服务

26+阅读 · 2025年10月22日

【CVPR2025】CrayonRobo：面向机器人操作的以对象为中心的提示驱动视觉-语言-动作模型

专知会员服务

11+阅读 · 2025年5月6日

【CVPR2024】ViewDiff: 3D一致的图像生成与文本到图像模型

专知会员服务

30+阅读 · 2024年3月10日

【NeurIPS2023】CQM: 与量化世界模型的课程强化学习

专知会员服务

25+阅读 · 2023年10月29日