Concurrent data structures often require additional memory for handling synchronization issues in addition to memory for storing elements. Depending on the amount of this additional memory, implementations can be more or less memory-friendly. A memory-optimal implementation enjoys the minimal possible memory overhead, which, in practice, reduces cache misses and unnecessary memory reclamation. In this paper, we discuss the memory-optimality of non-blocking bounded queues. Essentially, we investigate the possibility of constructing an implementation that utilizes a pre-allocated array to store elements and constant memory overhead, e.g., two positioning counters for enqueue(..) and dequeue() operations. Such an implementation can be readily constructed when the ABA problem is precluded, e.g., assuming that the hardware supports LL/SC instructions or all inserted elements are distinct. However, in the general case, we show that a memory-optimal non-blocking bounded queue incurs linear overhead in the number of concurrent processes. These results not only provide helpful intuition for concurrent algorithm developers but also open a new research avenue on the memory-optimality phenomenon in concurrent data structures.
翻译:并发数据结构除了存储元素所需的内存外,通常还需要额外内存来处理同步问题。根据这种额外内存的大小,实现方案的内存友好程度也有所不同。内存最优的实现方案具有最小可能的内存开销,这在实际中可以减少缓存未命中和不必要的内存回收。本文讨论了无锁有界队列的内存最优性问题。我们主要探究了能否构建一种实现,该实现利用预分配数组存储元素,并具有恒定的内存开销(例如,用于enqueue(..)和dequeue()操作的两个定位计数器)。当ABA问题被排除时(例如,假设硬件支持LL/SC指令或所有插入的元素都是互异的),这样的实现可以很容易地构建。然而,在一般情况下,我们证明了一个内存最优的无锁有界队列会产生与并发进程数量成线性关系的内存开销。这些结果不仅为并发算法开发者提供了有益的直觉,而且为并发数据结构中的内存最优性现象开辟了新的研究方向。