Compute eXpress Link (CXL) has emerged as a key enabler of memory disaggregation for future heterogeneous computing systems to expand memory on-demand and improve resource utilization. However, CXL is still in its infancy stage and lacks commodity products on the market, thus necessitating a reliable system-level simulation tool for research and development. In this paper, we propose CXL-DMSim, an open-source full-system simulator to simulate CXL disaggregated memory systems with high fidelity at a gem5-comparable simulation speed. CXL-DMSim incorporates a flexible CXL memory expander model along with its associated device driver, and CXL protocol support with CXL\.io and CXL\.mem. It can operate in both app-managed mode and kernel-managed mode, with the latter using a dedicated NUMA-compatible mechanism. The simulator has been rigorously verified against a real hardware testbed with both FPGA- and ASIC-based CXL memory devices, which demonstrates the qualification of CXL-DMSim in simulating the characteristics of various CXL memory devices at an average simulation error of 3.4%. The experimental results using LMbench and STREAM benchmarks suggest that the CXL-FPGA memory exhibits a ~2.88x higher latency than local DDR while the CXL-ASIC latency is ~2.18x; CXL-FPGA achieves 45-69% of local DDR memory bandwidth, whereas the number for CXL-ASIC is 82-83%. The study also reveals that CXL memory can significantly enhance the performance of memory-intensive applications, improved by 23x at most with limited local memory for Viper key-value database and approximately 60% in memory-bandwidth-sensitive scenarios such as MERCI. Moreover, the simulator's observability and expandability are showcased with detailed case-studies, highlighting its great potential for research on future CXL-interconnected hybrid memory pool.
翻译:计算快速链路(CXL)已成为未来异构计算系统中内存解聚的关键使能技术,能够按需扩展内存并提升资源利用率。然而,CXL仍处于发展初期,市场上缺乏商用产品,因此亟需可靠的系统级仿真工具以支持研发工作。本文提出CXL-DMSim——一款开源全系统仿真器,能以与gem5相当的仿真速度高保真地模拟CXL解聚内存系统。CXL-DMSim集成了灵活的CXL内存扩展器模型及其关联设备驱动,并通过CXL.io与CXL.mem提供CXL协议支持。该仿真器可在应用管理模式与内核管理模式下运行,后者采用专有的NUMA兼容机制。通过基于FPGA和ASIC的CXL内存设备真实硬件测试平台进行严格验证,证明CXL-DMSim能准确模拟各类CXL内存设备特性,平均仿真误差为3.4%。使用LMbench与STREAM基准测试的实验结果表明:CXL-FPGA内存延迟约为本地DDR的2.88倍,CXL-ASIC延迟约为2.18倍;CXL-FPGA内存带宽达到本地DDR的45-69%,而CXL-ASIC可达82-83%。研究还揭示CXL内存能显著提升内存密集型应用性能:在Viper键值数据库中,有限本地内存条件下性能最高提升23倍;在MERCI等内存带宽敏感场景中性能提升约60%。通过详细案例研究展示了仿真器的可观测性与可扩展性,凸显其在未来CXL互连混合内存池研究中的巨大潜力。