Remote Direct Memory Access (RDMA) is a memory technology that allows remote devices to directly write to and read from each other's memory, bypassing components such as the CPU and operating system. This enables low-latency high-throughput networking, as required for many modern data centres, HPC applications and AI/ML workloads. However, baseline RDMA comprises a highly permissive weak memory model that is difficult to use in practice and has only recently been formalised. In this paper, we introduce the Library of Composable Objects (LOCO), a formally verified library for building multi-node objects on RDMA, filling the gap between shared memory and distributed system programming. LOCO objects are well-encapsulated and take advantage of the strong locality and the weak consistency characteristics of RDMA. They have performance comparable to custom RDMA systems (e.g. distributed maps), but with a far simpler programming model amenable to formal proofs of correctness. To support verification, we develop a novel modular declarative verification framework, called Mowgli, that is flexible enough to model multinode objects and is independent of a memory consistency model. We instantiate Mowgli with the RDMA memory model, and use it to verify correctness of LOCO libraries.
翻译:远程直接内存访问(RDMA)是一种内存技术,允许远程设备直接相互读写对方的内存,绕过CPU和操作系统等组件。这实现了低延迟高吞吐量的网络通信,满足了许多现代数据中心、高性能计算应用以及人工智能/机器学习工作负载的需求。然而,基线RDMA包含一个高度宽松的弱内存模型,在实践中难以使用,且直到最近才被形式化。本文介绍了可组合对象库(LOCO),这是一个经过形式化验证的库,用于在RDMA上构建多节点对象,填补了共享内存与分布式系统编程之间的空白。LOCO对象封装良好,并充分利用了RDMA的强局部性和弱一致性特性。其性能可与定制RDMA系统(例如分布式映射)相媲美,但编程模型更为简单,适合进行正确性的形式化证明。为支持验证,我们开发了一种新颖的模块化声明式验证框架,称为Mowgli,该框架足够灵活以建模多节点对象,且独立于内存一致性模型。我们使用RDMA内存模型实例化Mowgli,并利用它验证LOCO库的正确性。