DiscoRD: An Experimental Methodology for Quickly Discovering the Reliable Read Disturbance Threshold of Real DRAM Chips

State-of-the-art DRAM read disturbance mitigations rely on the read disturbance threshold (RDT) (e.g., the number of aggressor row activations needed to induce the first read disturbance bitflip) to securely and performance- and energy-efficiently prevent read disturbance bitflips. However, accurately and exhaustively characterizing the RDT of every DRAM row in a chip is time intensive. Rapidly determining RDT is important for enabling secure, performance- and energy-efficient systems. Our goal is to develop and evaluate a reliable and rapid read disturbance testing methodology. To that end, we develop DiscoRD building on the key results of an extensive experimental characterization study using 212 real DDR4 chips whereby we measure the RDT of hundreds of thousands of DRAM rows millions of times. We develop an empirical model for read disturbance bitflips and evaluate the probability of read-disturbance-induced uncorrectable errors when a read disturbance mechanism is configured using a single $RDT_{min}$ measurement. Using this model we demonstrate that 1) relying on a lightweight error-correcting code (ECC) alone yields relatively high uncorrectable error probability and 2) combining ECC, infrequent memory scrubbing, and configurable read disturbance mitigation mechanisms can greatly reduce the error probability. Building on our observations and analyses, we discuss the RDT of each individual row can be identified more precisely. Our results show that error tolerance, memory scrubbing, online profiling, and run-time configurable read disturbance mitigation techniques are important to enable secure and energy-efficient spatial-variation aware read disturbance mitigations. We hope that DiscoRD drives research that enables us to quantitatively navigate the performance/cost - reliability tradeoff space for read disturbance mitigation techniques.

翻译：最先进的DRAM读取干扰缓解技术依赖于读取干扰阈值（RDT）（例如，诱发首次读取干扰位翻转所需的攻击行激活次数），以安全、高效且节能地防止读取干扰位翻转。然而，准确且详尽地刻画芯片中每一DRAM行的RDT非常耗时。快速确定RDT对于实现安全、高性能和节能的系统至关重要。我们的目标是开发并评估一种可靠且快速的读取干扰测试方法。为此，我们基于对212块真实DDR4芯片进行广泛实验表征研究的关键结果，开发了DiscoRD。在该研究中，我们对数十万DRAM行的RDT进行了数百万次测量。我们为读取干扰位翻转建立了一个经验模型，并评估了当读取干扰机制使用单一$RDT_{min}$测量值进行配置时，由读取干扰引起的不可纠正错误的概率。利用该模型，我们证明：1）仅依赖轻量级纠错码（ECC）会产生相对较高的不可纠正错误概率；2）将ECC、低频内存清理和可配置的读取干扰缓解机制相结合，可以显著降低错误概率。基于我们的观察和分析，我们讨论了如何更精确地识别每个独立行的RDT。我们的结果表明，容错能力、内存清理、在线性能分析以及运行时可配置的读取干扰缓解技术对于实现安全、节能且能感知空间变化的读取干扰缓解至关重要。我们希望DiscoRD能够推动相关研究，使我们能够在读取干扰缓解技术的性能/成本与可靠性权衡空间中进行定量探索。