Remote direct memory access (RDMA) networks are being rapidly adopted into industry for their high speed, low latency, and reduced CPU overheads compared to traditional kernel-based TCP/IP networks. RDMA enables threads to access remote memory without interacting with another process. However, atomicity between local accesses and remote accesses is not guaranteed by the technology, hence complicating synchronization significantly. The current solution is to require threads wanting to access local memory in an RDMA-accessible region to pass through the RDMA card using a mechanism known as loopback, but this can quickly degrade performance. In this paper, we introduce ALock, a novel locking primitive designed for RDMA-based systems. ALock allows programmers to synchronize local and remote accesses without using loopback or remote procedure calls (RPCs). We draw inspiration from the classic Peterson's algorithm to create a hierarchical design that includes embedded MCS locks for two cohorts, remote and local. To evaluate the ALock we implement a distributed lock table, measuring throughput and latency in various cluster configurations and workloads. In workloads with a majority of local operations, the ALock outperforms competitors up to 29x and achieves a latency up to 20x faster.
翻译:远程直接内存访问(RDMA)网络因其高速、低延迟及较传统基于内核的TCP/IP网络更低的CPU开销,正被迅速应用于工业领域。RDMA使线程能够在不与另一进程交互的情况下访问远程内存。然而,该技术无法保证本地访问与远程访问之间的原子性,从而显著增加了同步的复杂性。当前的解决方案是要求线程通过一种称为回环的机制经由RDMA卡访问RDMA可访问区域中的本地内存,但这会迅速降低性能。本文提出ALock,一种专为基于RDMA的系统设计的新型锁原语。ALock允许程序员在不使用回环或远程过程调用(RPC)的情况下同步本地与远程访问。我们从经典的Peterson算法中汲取灵感,构建了一种层次化设计,其中包含分别面向远程与本地两个群体的嵌入式MCS锁。为评估ALock,我们实现了一个分布式锁表,并在多种集群配置和工作负载下测量其吞吐量与延迟。在多数操作为本地操作的工作负载中,ALock的性能优于竞品最多达29倍,且延迟降低最多达20倍。