Disaggregated storage with NVMe-over-Fabrics (NVMe-oF) has emerged as the standard solution in modern supercomputers and data center clusters, achieving superior performance, resource utilization, and power efficiency. Simultaneously, confidential computing (CC) is becoming the de facto security paradigm, enforcing stronger isolation and protection for sensitive workloads. However, securing state-of-the-art storage with traditional CC methods struggles to scale and compromises performance or security. To address these issues, we introduce Hazel, a storage management system that extends the NVMe-oF protocol capabilities and adheres to the CC threat model, providing confidentiality, integrity, and freshness guarantees. Hazel offers an appropriate control path with novel concepts such as counter-leasing. Hazel also optimizes data path performance by leveraging NVMe metadata and introducing a new disaggregated Hazel Merkle Tree (HMT), all while remaining compatible with NVMe-oF. For additional efficiency, Hazel also supports offloading to CC-capable smart NIC accelerators. We prototype Hazel on an NVIDIA BlueField-3 and demonstrate that it can achieve as little as 1-2% performance degradation for synthetic patterns, AI training, IO500, and YCSB.
翻译:基于NVMe-over-Fabrics(NVMe-oF)的解耦存储已成为现代超级计算机和数据中心集群的标准解决方案,实现了卓越的性能、资源利用率和能效。与此同时,机密计算(CC)正成为实际的安全范式,为敏感工作负载提供更强的隔离与保护。然而,采用传统CC方法保护先进存储系统面临可扩展性挑战,并可能损害性能或安全性。为解决这些问题,我们提出了Hazel——一种扩展NVMe-oF协议能力并遵循CC威胁模型的存储管理系统,可提供机密性、完整性和新鲜性保证。Hazel通过计数器租赁等创新概念构建了适配的控制路径,同时利用NVMe元数据并引入新型解耦式Hazel默克尔树(HMT)来优化数据路径性能,且保持与NVMe-oF的兼容性。为提升效率,Hazel还支持向具备CC能力的智能网卡加速器卸载任务。我们在NVIDIA BlueField-3平台上对Hazel进行原型实现,实验表明其在合成负载模式、AI训练、IO500和YCSB测试中仅产生1-2%的性能开销。