As the memory channel count is confined by physical dimensions, memory expanders appear to be a promising approach to extending memory capacity and channels by augmenting the existing I/O interface (e.g., PCIe) with memory-semantic protocols like CXL. Unfortunately, the physical constraints of a computing system restrict scalable capacity expansion with memory expanders. In this work, we propose a block-level compression scheme for modern memory expanders, IBEX, to achieve larger effective memory capacity. Given the performance overhead associated with block-level compression algorithms (e.g., LZ77), IBEX employs a promotion-based approach: only cold data is compressed, whereas hot data remains uncompressed. Our key innovation is internal bandwidth-efficient block management that precisely identifies cold pages with minimal metadata access overhead. Still, the promotion-based approach poses several performance-related challenges at the design level. Therefore, we also propose a shadowed promotion scheme that temporarily postpones the deallocation of promoted data, thereby mitigating the performance penalty incurred by demotion (i.e., recompression). Furthermore, we optimize our compression scheme by compacting metadata and co-locating multiple target blocks for efficient bandwidth utilization. Consequently, IBEX achieves an average of 1.28x-1.40x speedups compared to the state-of-the-art promotion-based block-level approaches. We open-source IBEX at https://github.com/relacslab/ibex-ics26.
翻译:随着内存通道数量受物理尺寸限制,内存扩展器通过利用CXL等内存语义协议增强现有I/O接口(如PCIe),成为扩展内存容量和通道的有效途径。然而,计算系统的物理约束限制了内存扩展器实现可扩展容量扩展。本文提出一种面向现代内存扩展器的块级压缩方案IBEX,以实现更大的有效内存容量。针对块级压缩算法(如LZ77)带来的性能开销,IBEX采用基于晋升的方法:仅压缩冷数据,而热数据保持不压缩状态。我们的核心创新在于实现了一种内部带宽高效的块管理机制,能以最小元数据访问开销精确识别冷页。然而,这种基于晋升的方法在架构层面带来了若干性能相关挑战。为此,我们还提出一种影子晋升方案,临时推迟晋升数据的解除分配,从而缓解降级(即重新压缩)导致的性能损失。此外,我们通过紧凑化元数据存储及共定位多个目标块来优化压缩方案,以实现高效带宽利用率。实验结果表明,与现有最先进的基于晋升的块级方法相比,IBEX实现了平均1.28倍至1.40倍的加速。我们将IBEX开源于https://github.com/relacslab/ibex-ics26。