We present The Matrix, the first foundational realistic world simulator capable of generating continuous 720p high-fidelity real-scene video streams with real-time, responsive control in both first- and third-person perspectives, enabling immersive exploration of richly dynamic environments. Trained on limited supervised data from AAA games like Forza Horizon 5 and Cyberpunk 2077, complemented by large-scale unsupervised footage from real-world settings like Tokyo streets, The Matrix allows users to traverse diverse terrains -- deserts, grasslands, water bodies, and urban landscapes -- in continuous, uncut hour-long sequences. Operating at 16 FPS, the system supports real-time interactivity and demonstrates zero-shot generalization, translating virtual game environments to real-world contexts where collecting continuous movement data is often infeasible. For example, The Matrix can simulate a BMW X3 driving through an office setting--an environment present in neither gaming data nor real-world sources. This approach showcases the potential of AAA game data to advance robust world models, bridging the gap between simulations and real-world applications in scenarios with limited data.
翻译:我们提出了“矩阵”(The Matrix),这是首个基础性的逼真世界模拟器,能够以第一人称和第三人称视角生成连续720p高保真实景视频流,并实现实时响应控制,从而支持对高度动态环境的沉浸式探索。该系统通过在《极限竞速:地平线5》和《赛博朋克2077》等3A游戏的有限监督数据上进行训练,并辅以东京街道等现实场景的大规模无监督视频素材,使用户能够以连续不间断的一小时时长序列穿越多样地形——包括沙漠、草原、水体及城市景观。该系统以16 FPS的速率运行,支持实时交互,并展现出零样本泛化能力,能够将虚拟游戏环境迁移到难以采集连续运动数据的现实场景中。例如,“矩阵”可以模拟一辆宝马X3在办公环境中行驶——这种场景既未出现在游戏数据中,也未存在于现实世界素材中。该方法展示了利用3A游戏数据推进鲁棒世界模型的潜力,在数据有限的场景下弥合了仿真模拟与现实应用之间的鸿沟。