Various constraints of Static Random Access Memory (SRAM) are leading to consider new memory technologies as candidates for building on-chip shared last-level caches (SLLCs). Spin-Transfer Torque RAM (STT-RAM) is currently postulated as the prime contender due to its better energy efficiency, smaller die footprint and higher scalability. However, STT-RAM also exhibits some drawbacks, like slow and energy-hungry write operations, that need to be mitigated. In this work we address these shortcomings by leveraging a new management mechanism for STT-RAM SLLCs. This approach is based on the previous observation that the stream of references arriving at the SLLC of a Chip MultiProcessor (CMP) exhibits reuse locality, i.e., those blocks referenced several times manifest high probability of forthcoming reuse. In this paper, we employ a cache management mechanism that selects the contents of the SLLC aimed to exploit reuse locality instead of temporal locality. Specifically, our proposal consists in the inclusion of a Reuse Detector between private cache levels and the STT-RAM SLLC to detect blocks that do not exhibit reuse, in order to avoid their insertion in the SLLC, hence reducing the number of write operations and the energy consumption in the STT-RAM. Our evaluation reveals that our scheme reports on average, energy reductions in the SLLC in the range of 37-30\%, additional energy savings in the main memory in the range of 6-8\% and performance improvements of 3\% up to 14\% (16-core) compared to an STT-RAM SLLC baseline where no reuse detector is employed. More importantly, our approach outperforms DASCA, the state-of-the-art STT-RAM SLLC management, reporting SLLC energy savings in the range of 4-11\% higher than those of DASCA, delivering higher performance in the range of 1.5-14\%, and additional improvements in DRAM energy consumption in the range of 2-9\% higher than DASCA.
翻译:静态随机存取存储器(SRAM)的各种限制促使人们考虑采用新型存储器技术构建片上共享末级缓存(SLLC)。自旋转移矩RAM(STT-RAM)因其更优的能效、更小的芯片面积和更高的可扩展性,目前被公认为主要候选技术。然而,STT-RAM也存在写入操作缓慢且能耗高等缺陷亟待解决。本文通过为STT-RAM SLLC设计新型管理机制来应对这些不足。该方法基于先前发现的观察现象:到达多核处理器(CMP)SLLC的访存流呈现重用局部性,即被多次引用的块具有较高的即将被重用的概率。本文采用了一种缓存管理机制,通过选择SLLC的内容来利用重用局部性而非时间局部性。具体而言,我们提出的方案是在私有缓存层级与STT-RAM SLLC之间引入一个复用检测器,用于检测不具重用的缓存块,从而避免将其插入SLLC,以此减少STT-RAM的写入次数并降低能耗。评估结果表明,与未采用复用检测器的STT-RAM SLLC基准相比,我们的方案可使SLLC能耗平均降低37-30%,主存能耗额外降低6-8%,性能提升3%至14%(16核)。更重要的是,我们的方法优于当前最先进的STT-RAM SLLC管理方案DASCA,其SLLC能耗降低幅度比DASCA高出4-11%,性能提升幅度达1.5-14%,且DRAM能耗的额外改善效果比DASCA高出2-9%。