PoCL-R: An Open Standard Based Offloading Layer for Heterogeneous Multi-Access Edge Computing with Server Side Scalability

We propose a novel computing runtime that exposes remote compute devices via the cross-vendor open heterogeneous computing standard OpenCL and can execute compute tasks on the MEC cluster side across multiple servers in a scalable manner. Intermittent UE connection loss is handled gracefully even if the device's IP address changes on the way. Network-induced latency is minimized by transferring data and signaling command completions between remote devices in a peer-to-peer fashion directly to the target server with a streamlined TCP-based protocol that yields a command latency of only 60 microseconds on top of network round-trip latency in synthetic benchmarks. The runtime can utilize RDMA to speed up inter-server data transfers by an additional 60% compared to the TCP-based solution. The benefits of the proposed runtime in MEC applications are demonstrated with a smartphone-based augmented reality rendering case study. Measurements show up to 19x improvements to frame rate and 17x improvements to local energy consumption when using the proposed runtime to offload AR rendering from a smartphone. Scalability to multiple GPU servers in real-world applications is shown in a computational fluid dynamics simulation, which scales with the number of servers at roughly 80% efficiency which is comparable to an MPI port of the same simulation.

翻译：本文提出一种新型计算运行时，通过跨供应商开放异构计算标准OpenCL暴露远程计算设备，并能在MEC集群侧以可扩展方式跨多服务器执行计算任务。该运行时能优雅处理UE连接间歇性中断问题，即使设备IP地址在传输过程中发生变化仍可正常工作。通过采用基于TCP的简化协议，以点对点方式直接在远程设备间传输数据并向目标服务器发送指令完成信号，将网络延迟影响降至最低——在合成基准测试中，指令延迟仅比网络往返延迟增加60微秒。该运行时还可利用RDMA将服务器间数据传输速度较TCP方案额外提升60%。通过基于智能手机的增强现实渲染案例研究，展示了该运行时在MEC应用中的优势：使用所提运行时将AR渲染任务从智能手机卸载后，帧率最高提升19倍，本地能耗降低17倍。在真实应用场景中，计算流体动力学仿真验证了其面向多GPU服务器的可扩展性——以约80%的效率实现与服务器数量线性扩展，该效率与同一仿真程序的MPI版本相当。

相关内容

服务器

关注 14

服务器，也称伺服器，是提供计算服务的设备。由于服务器需要响应服务请求，并进行处理，因此一般来说服务器应具备承担服务并且保障服务的能力。
服务器的构成包括处理器、硬盘、内存、系统总线等，和通用的计算机架构类似，但是由于需要提供高可靠的服务，因此在处理能力、稳定性、可靠性、安全性、可扩展性、可管理性等方面要求较高。

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日