Concurrent data structures often require additional memory for handling synchronization issues in addition to memory for storing elements. Depending on the amount of this additional memory, implementations can be more or less memory-friendly. A memory-optimal implementation enjoys the minimal possible memory overhead, which, in practice, reduces cache misses and unnecessary memory reclamation. In this paper, we discuss the memory-optimality of non-blocking bounded queues. Essentially, we investigate the possibility of constructing an implementation that utilizes a pre-allocated array to store elements and constant memory overhead, e.g., two positioning counters for enqueue(..) and dequeue() operations. Such an implementation can be readily constructed when the ABA problem is precluded, e.g., assuming that the hardware supports LL/SC instructions or all inserted elements are distinct. However, in the general case, we show that a memory-optimal non-blocking bounded queue incurs linear overhead in the number of concurrent processes. These results not only provide helpful intuition for concurrent algorithm developers but also open a new research avenue on the memory-optimality phenomenon in concurrent data structures.
翻译:并发数据结构在存储元素所需内存之外,通常还需要额外的内存来处理同步问题。根据额外内存量的不同,实现方式在内存友好程度上存在差异。内存最优的实现享有最小的内存开销,这在实践中能减少缓存未命中和不必要的内存回收。本文探讨了无锁有界队列的内存最优性。具体而言,我们研究了构建一种实现的可能性,该实现利用预分配数组存储元素,且内存开销恒定,例如使用两个定位计数器分别用于enqueue(..)和dequeue()操作。当ABA问题被排除时(例如假设硬件支持LL/SC指令或所有插入元素互不相同),这类实现可以轻易构建。然而,在一般情况下,我们证明内存最优的无锁有界队列会引发与并发进程数成线性关系的开销。这些结果不仅为并发算法开发者提供了有益的直觉,也为并发数据结构中内存最优性现象开辟了新的研究途径。