Concurrent data structures often require additional memory for handling synchronization issues in addition to memory for storing elements. Depending on the amount of this additional memory, implementations can be more or less memory-friendly. A memory-optimal implementation enjoys the minimal possible memory overhead, which, in practice, reduces cache misses and unnecessary memory reclamation. In this paper, we discuss the memory-optimality of non-blocking bounded queues. Essentially, we investigate the possibility of constructing an implementation that utilizes a pre-allocated array to store elements and constant memory overhead, e.g., two positioning counters for enqueue(..) and dequeue() operations. Such an implementation can be readily constructed when the ABA problem is precluded, e.g., assuming that the hardware supports LL/SC instructions or all inserted elements are distinct. However, in the general case, we show that a memory-optimal non-blocking bounded queue incurs linear overhead in the number of concurrent processes. These results not only provide helpful intuition for concurrent algorithm developers but also open a new research avenue on the memory-optimality phenomenon in concurrent data structures.
翻译:并发数据结构除了存储元素所需的内存外,通常还需要额外内存来处理同步问题。根据额外内存的大小,实现可能更或更不内存友好。内存最优实现享有最小的内存开销,这在实际应用中可减少缓存未命中和不必要的内存回收。本文讨论无锁有界队列的内存最优性。我们主要探究能否构建一种实现,利用预分配数组存储元素并保持恒定内存开销,例如为enqueue(..)和dequeue()操作配备两个定位计数器。当ABA问题被排除时(例如假设硬件支持LL/SC指令或所有插入元素互不相同),可以轻松构建此类实现。但在一般情况下,我们证明内存最优的无锁有界队列会产生与并发进程数成线性关系的开销。这些结论不仅为并发算法开发者提供了实用直觉,也为并发数据结构中的内存最优性现象开辟了新的研究方向。