GPU-based simulation environments for embodied AI interleave physics simulation (CUDA) and photorealistic rendering (Vulkan) on a single device. We observe that two foundational scenarios -- simulation data generation and RL training -- can be naturally adapted to execute their simulation and rendering phases concurrently, presenting a significant opportunity to improve GPU utilization through spatial multiplexing. However, a fundamental obstacle we term execution isolation prevents this: CUDA and Vulkan create separate GPU contexts whose channels are bound to different scheduling groups, confining compute and graphics to mutually exclusive time slices. Existing spatial-sharing techniques are limited to the CUDA ecosystem, while temporal-sharing approaches underutilize available resources. This paper presents VUDA, a system that breaks execution isolation to enable spatial parallelism between CUDA compute and Vulkan graphics workloads. VUDA is built on two key observations: although CUDA and Vulkan expose different programming abstractions, their execution paths converge to a common channel primitive at the driver and hardware level; meanwhile, their virtual-address spaces are inherently disjoint, making safe page-table merging feasible without remapping. VUDA exposes a thin API for developers to annotate co-schedulable CUDA streams, and realizes spatial sharing through channel redirection into Vulkan's scheduling domain and page-table grafting to unify address spaces, eliminating all data copying on the critical path. Experiments on representative embodied-AI workloads show that VUDA delivers up to 85% higher throughput than temporal-sharing baselines, while improving GPU utilization and reducing end-to-end latency.
翻译:基于GPU的具身AI仿真环境在同一设备上交错执行物理仿真(CUDA)与照片级真实感渲染(Vulkan)。我们观察到两个基础场景——仿真数据生成与强化学习训练——可自然适配为仿真与渲染阶段的并发执行,这为通过空间复用提升GPU利用率提供了重要契机。然而,我们称之为执行隔离的根本性障碍阻碍了这一目标:CUDA与Vulkan创建了独立的GPU上下文,其通道绑定至不同调度组,导致计算与图形被限制在互斥时间片内。现有空间共享技术仅局限于CUDA生态,而时间共享方法则难以充分利用可用资源。本文提出VUDA系统,通过打破执行隔离实现CUDA计算与Vulkan图形工作负载的空间并行。VUDA基于两个关键观察构建:尽管CUDA与Vulkan呈现不同的编程抽象,但二者的执行路径在驱动与硬件层面汇聚为统一通道原语;同时,它们的虚拟地址空间天然不相交,这使无需重映射的安全页表合并成为可能。VUDA提供轻量级API供开发者标注可协同调度的CUDA流,并通过通道重定向至Vulkan调度域以及页表嫁接统一地址空间来实现空间共享,彻底消除关键路径上的数据拷贝。在代表性具身AI工作负载上的实验表明,VUDA较时间共享基线方案可提升高达85%的吞吐量,同时提高GPU利用率并降低端到端延迟。