Disaggregated memory leverages recent technology advances in high-density, byte-addressable non-volatile memory and high-performance interconnects to provide a large memory pool shared across multiple compute nodes. Due to higher memory density, memory errors may become more frequent. Unfortunately, tolerating memory errors through existing memory-error protection techniques becomes impractical due to increasing storage cost. This work proposes replication-aware memory-error protection to improve storage efficiency of protection in data-centric applications that already rely on memory replication for performance and availability. It lets such applications lower protection storage cost by weakening the protection of each individual replica, but still realize a strong protection target by relying on the collective protection conferred by multiple replicas.
翻译:分立内存利用高密度、可字节寻址的非易失性内存与高性能互连等最新技术进展,构建跨多个计算节点共享的大容量内存池。由于内存密度提升,内存错误可能更频繁发生。然而,通过现有内存错误保护技术来容错,因存储成本持续攀升而变得不切实际。本文提出支持副本感知的内存错误保护方案,旨在提升已依赖内存副本实现性能与可用性的数据密集型应用的存储效率。该方案允许应用通过削弱每个独立副本的保护强度来降低保护存储成本,同时借助多副本的集体保护能力实现高可靠性目标。