The emerging CXL.mem standard provides a new type of byte-addressable remote memory with a variety of memory types and hierarchies. With CXL.mem, multiple layers of memory -- e.g., local DRAM and CXL-attached remote memory at different locations -- are exposed to operating systems and user applications, bringing new challenges and research opportunities. Unfortunately, since CXL.mem devices are not commercially available, it is difficult for researchers to conduct systems research that uses CXL.mem. In this paper, we present our ongoing work, CXLMemSim, a fast and lightweight CXL.mem simulator for performance characterization. CXLMemSim uses a performance model driven using performance monitoring events, which are supported by most commodity processors. Specifically, CXLMemSim attaches to an existing, unmodified program, and divides the execution of the program into multiple epochs; once an epoch finishes, CXLMemSim collects performance monitoring events and calculates the simulated execution time of the epoch based on these events. Through this method, CXLMemSim avoids the performance overhead of a full-system simulator (e.g., Gem5) and allows the memory hierarchy and latency to be easily adjusted, enabling research such as memory scheduling for complex applications. Our preliminary evaluation shows that CXLMemSim slows down the execution of the attached program by 4.41x on average for real-world applications.
翻译:新兴的CXL.mem标准提供了一种新型的字节可寻址远程内存,具有多种内存类型和层次结构。借助CXL.mem,多个内存层(例如本地DRAM和位于不同位置的CXL附加远程内存)被暴露给操作系统和用户应用程序,带来了新的挑战和研究机遇。然而,由于CXL.mem设备尚未商用化,研究人员很难开展涉及CXL.mem的系统研究。在本文中,我们介绍了正在进行的工作CXLMemSim,这是一种用于性能表征的快速轻量级CXL.mem模拟器。CXLMemSim使用基于性能监控事件驱动的性能模型,这些事件受大多数商用处理器支持。具体而言,CXLMemSim附加到现有的未修改程序上,并将程序的执行划分为多个时段;一旦某个时段结束,CXLMemSim会收集性能监控事件,并根据这些事件计算该时段的模拟执行时间。通过这种方法,CXLMemSim避免了全系统模拟器(例如Gem5)的性能开销,并允许轻松调整内存层次结构和延迟,从而支持诸如复杂应用的内存调度等研究。我们的初步评估表明,对于实际应用,CXLMemSim将附加程序的执行速度平均减慢4.41倍。