Disaggregated storage with NVMe-over-Fabrics (NVMe-oF) has emerged as the standard solution in modern data centers, achieving superior performance, resource utilization, and power efficiency. Simultaneously, confidential computing (CC) is becoming the de facto security paradigm, enforcing stronger isolation and protection for sensitive workloads. However, securing state-of-the-art storage with traditional CC methods struggles to scale and compromises performance or security. To address these issues, we introduce sNVMe-oF, a storage management system extending the NVMe-oF protocol and adhering to the CC threat model by providing confidentiality, integrity, and freshness guarantees. sNVMe-oF offers an appropriate control path and novel concepts such as counter-leasing. sNVMe-oF also optimizes data path performance by leveraging NVMe metadata, introducing a new disaggregated Hazel Merkle Tree (HMT), and avoiding redundant IPSec protections. We achieve this without modifying the NVMe-oF protocol. To prevent excessive resource usage while delivering line rate, sNVMe-oF also uses accelerators of CC-capable smart NICs. We prototype sNVMe-oF on an NVIDIA BlueField-3 and demonstrate how it can achieve as little as 2% performance degradation for synthetic patterns and AI training.
翻译:基于NVMe-over-Fabrics(NVMe-oF)的分解式存储已成为现代数据中心的标准解决方案,实现了卓越的性能、资源利用率和能效。与此同时,机密计算(CC)正成为事实上的安全范式,为敏感工作负载提供更强的隔离与保护。然而,采用传统CC方法保护先进存储系统面临扩展性挑战,并可能损害性能或安全性。为解决这些问题,我们提出了sNVMe-oF——一种通过提供机密性、完整性和新鲜性保证来扩展NVMe-oF协议并遵循CC威胁模型的存储管理系统。sNVMe-oF提供了适当的控制路径及计数器租赁等新颖概念。该系统还通过利用NVMe元数据、引入新型分解式Hazel Merkle树(HMT)以及避免冗余IPSec保护来优化数据路径性能。我们在不修改NVMe-oF协议的前提下实现了这些特性。为在保持线速性能的同时避免资源过度消耗,sNVMe-oF还利用了支持CC功能的智能网卡加速器。我们在NVIDIA BlueField-3平台上实现了sNVMe-oF原型,并证明其在合成模式与AI训练场景中仅产生低至2%的性能损失。