As large language models (LLMs) continue to improve in reasoning and decision-making, there is a growing need for realistic and interactive environments where their abilities can be rigorously evaluated. We present VirtualEnv, a next-generation simulation platform built on Unreal Engine 5 that enables fine-grained benchmarking of LLMs in embodied and interactive scenarios. VirtualEnv supports rich agent-environment interactions, including object manipulation, navigation, and adaptive multi-agent collaboration, as well as game-inspired mechanics like escape rooms and procedurally generated environments. We provide a user-friendly API built on top of Unreal Engine, allowing researchers to deploy and control LLM-driven agents using natural language instructions. We integrate large-scale LLMs and vision-language models (VLMs), such as GPT-based models, to generate novel environments and structured tasks from multimodal inputs. Our experiments benchmark the performance of several popular LLMs across tasks of increasing complexity, analyzing differences in adaptability, planning, and multi-agent coordination. We also describe our methodology for procedural task generation, task validation, and real-time environment control. VirtualEnv is released as an open-source platform, we aim to advance research at the intersection of AI and gaming, enable standardized evaluation of LLMs in embodied AI settings, and pave the way for future developments in immersive simulations and interactive entertainment.
翻译:随着大语言模型(LLMs)在推理和决策能力上的持续提升,对能够严格评估其能力的逼真且交互式环境的需求日益增长。我们提出了VirtualEnv,一个基于虚幻引擎5构建的新一代仿真平台,能够在具身和交互场景中对LLMs进行细粒度基准测试。VirtualEnv支持丰富的智能体-环境交互,包括物体操控、导航和自适应多智能体协作,以及受游戏启发的机制,如密室逃脱和程序化生成的环境。我们提供了一个构建于虚幻引擎之上的用户友好API,允许研究人员使用自然语言指令部署和控制LLM驱动的智能体。我们集成了大规模LLMs和视觉-语言模型(VLMs),例如基于GPT的模型,以从多模态输入中生成新颖的环境和结构化任务。我们的实验对多个流行LLMs在复杂度递增的任务上的性能进行了基准测试,分析了它们在适应性、规划和多智能体协调方面的差异。我们还描述了程序化任务生成、任务验证和实时环境控制的方法论。VirtualEnv作为开源平台发布,我们旨在推动人工智能与游戏交叉领域的研究,实现LLMs在具身AI场景中的标准化评估,并为沉浸式仿真和交互式娱乐的未来发展铺平道路。