Concurrent priority queues are widely used in important workloads, such as graph applications and discrete event simulations. However, designing scalable concurrent priority queues for NUMA architectures is challenging. Even though several NUMA-oblivious implementations can scale up to a high number of threads, exploiting the potential parallelism of insert operation, NUMA-oblivious implementations scale poorly in deleteMin-dominated workloads. This is because all threads compete for accessing the same memory locations, i.e., the highest-priority element of the queue, thus incurring excessive cache coherence traffic and non-uniform memory accesses between nodes of a NUMA system. In such scenarios, NUMA-aware implementations are typically used to improve system performance on a NUMA system. In this work, we propose an adaptive priority queue, called SmartPQ. SmartPQ tunes itself by switching between a NUMA-oblivious and a NUMA-aware algorithmic mode to achieve high performance under all various contention scenarios. SmartPQ has two key components. First, it is built on top of NUMA Node Delegation (Nuddle), a generic low-overhead technique to construct efficient NUMA-aware data structures using any arbitrary concurrent NUMA-oblivious implementation as its backbone. Second, SmartPQ integrates a lightweight decision making mechanism to decide when to switch between NUMA-oblivious and NUMA-aware algorithmic modes. Our evaluation shows that, in NUMA systems, SmartPQ performs best in all various contention scenarios with 87.9% success rate, and dynamically adapts between NUMA-aware and NUMA-oblivious algorithmic mode, with negligible performance overheads. SmartPQ improves performance by 1.87x on average over SprayList, the state-of-theart NUMA-oblivious priority queue.
翻译:并发优先队列在重要工作负载(如图形应用和离散事件模拟)中应用广泛。然而,为NUMA架构设计可扩展的并发优先队列具有挑战性。尽管若干NUMA无关的实现方案能够扩展到大量线程并利用插入操作的潜在并行性,但在以deleteMin为主的工作负载中,NUMA无关的实现方案扩展性较差。这是因为所有线程都争相访问相同的内存位置(即队列中的最高优先级元素),从而导致NUMA系统节点间产生过量的缓存一致性流量和非均匀内存访问。在此类场景中,通常采用NUMA感知的实现方案来提升NUMA系统性能。本文提出一种名为SmartPQ的自适应优先队列。SmartPQ通过在NUMA无关和NUMA感知两种算法模式间动态切换,实现在各种争用场景下的高性能表现。SmartPQ包含两个核心组件:首先,它构建于NUMA节点委托(Nuddle)技术之上——这是一种通用的低开销技术,能够以任意并发NUMA无关实现为骨干构建高效的NUMA感知数据结构;其次,SmartPQ集成了轻量级决策机制,用于判定何时在NUMA无关与NUMA感知算法模式间切换。实验评估表明,在NUMA系统中,SmartPQ在所有争用场景中均取得最优性能(成功率87.9%),并能以可忽略的性能开销在NUMA感知与NUMA无关算法模式间动态适配。相较于当前最先进的NUMA无关优先队列SprayList,SmartPQ平均性能提升达1.87倍。