We propose a novel computing runtime that exposes remote compute devices via the cross-vendor open heterogeneous computing standard OpenCL and can execute compute tasks on the MEC cluster side across multiple servers in a scalable manner. Intermittent UE connection loss is handled gracefully even if the device's IP address changes on the way. Network-induced latency is minimized by transferring data and signaling command completions between remote devices in a peer-to-peer fashion directly to the target server with a streamlined TCP-based protocol that yields a command latency of only 60 microseconds on top of network round-trip latency in synthetic benchmarks. The runtime can utilize RDMA to speed up inter-server data transfers by an additional 60% compared to the TCP-based solution. The benefits of the proposed runtime in MEC applications are demonstrated with a smartphone-based augmented reality rendering case study. Measurements show up to 19x improvements to frame rate and 17x improvements to local energy consumption when using the proposed runtime to offload AR rendering from a smartphone. Scalability to multiple GPU servers in real-world applications is shown in a computational fluid dynamics simulation, which scales with the number of servers at roughly 80% efficiency which is comparable to an MPI port of the same simulation.
翻译:本文提出一种新型计算运行时,通过跨供应商开放异构计算标准OpenCL暴露远程计算设备,并能在MEC集群侧以可扩展方式跨多服务器执行计算任务。该运行时能优雅处理UE连接间歇性中断问题,即使设备IP地址在传输过程中发生变化仍可正常工作。通过采用基于TCP的简化协议,以点对点方式直接在远程设备间传输数据并向目标服务器发送指令完成信号,将网络延迟影响降至最低——在合成基准测试中,指令延迟仅比网络往返延迟增加60微秒。该运行时还可利用RDMA将服务器间数据传输速度较TCP方案额外提升60%。通过基于智能手机的增强现实渲染案例研究,展示了该运行时在MEC应用中的优势:使用所提运行时将AR渲染任务从智能手机卸载后,帧率最高提升19倍,本地能耗降低17倍。在真实应用场景中,计算流体动力学仿真验证了其面向多GPU服务器的可扩展性——以约80%的效率实现与服务器数量线性扩展,该效率与同一仿真程序的MPI版本相当。