The growing demands in the training and inference of Large Language Models (LLMs) are accelerating the adoption of scale-up systems that extend server shared memory through the use of Compute Express Link (CXL)-based load/store interconnects. Accurate full-system simulation of such architectures remains challenging, as existing tools (all very recent) rely on simplified or non-compliant architectural models, impacting accuracy and usability. We present CXLRAMSim, the first gem5-integrated, full-system simulator that models CXL devices at their correct position on the I/O bus, enabling the use of unmodified Linux kernels and software stack, realistic latency-bandwidth behavior and true interleaving with system DRAM. Our approach provides high-fidelity CXL.mem characterization and captures key challenges such as cache pollution when accessing CXL memory.
翻译:大语言模型(LLMs)训练与推理需求的日益增长,加速了通过Compute Express Link (CXL) 加载/存储互连技术扩展服务器共享内存的规模扩展系统的采用。对此类架构进行精确的全系统仿真仍具有挑战性,因为现有工具(均为近期开发)依赖于简化或非标准的架构模型,影响了精度和可用性。我们提出CXLRAMSim,这是首个集成于gem5的全系统模拟器,能够在正确的I/O总线位置对CXL设备进行建模,从而支持使用未经修改的Linux内核与软件栈、实现逼真的延迟-带宽特性以及与系统DRAM的真实交织访问。我们的方法提供了高保真的CXL.mem特性表征,并捕捉了访问CXL内存时缓存污染等关键问题。