CXL-based Computational Memory (CCM) enables near-memory processing within expanded remote memory, offering opportunities to address data movement costs in disaggregated memory systems and to accelerate overall performance. However, existing offloading mechanisms do not fully leverage the trade-offs of different offload models based on different CXL protocols. This work first examines these tradeoffs and their impact on end-to-end performance and system efficiency for workloads with diverse data and computation characteristics. We propose Asynchronous Back-Streaming, a new offloading protocol that coordinates CXL.io and CXL.mem to enable result back-streaming and asynchronous pipelining across CCM and host tasks. We further design AXLE, a system that realizes this protocol with lightweight host-CCM interaction. Overall, AXLE reduces end-to-end runtime by up to 50.14%, reduces CCM and host idle times by an average of 14.53x and 3.93x, respectively, and achieves up to 6x reduction in host core stall time.
翻译:摘要:基于CXL的计算内存(CCM)能够在扩展的远程内存中实现近内存处理,为解决解耦内存系统中的数据迁移成本并提升整体性能提供了机遇。然而,现有卸载机制未能充分利用基于不同CXL协议的卸载模型间的权衡。本研究首先分析了这些权衡及其对不同数据与计算特征工作负载的端到端性能及系统效率的影响。我们提出了异步回传(Asynchronous Back-Streaming)——一种新的卸载协议,通过协调CXL.io与CXL.mem实现结果回传以及CCM与主机任务间的异步流水线。进而设计AXLE系统,以轻量级的主机-CCM交互实现该协议。总体而言,AXLE将端到端运行时间降低高达50.14%,使CCM与主机空闲时间平均分别减少14.53倍和3.93倍,并实现主机核心停滞时间最高达6倍的缩减。