Extensive polling in shared-memory manycore systems can lead to contention, decreased throughput, and poor energy efficiency. Both lock implementations and the general-purpose atomic operation, load-reserved/store-conditional (LRSC), cause polling due to serialization and retries. To alleviate this overhead, we propose LRwait and SCwait, a synchronization pair that eliminates polling by allowing contending cores to sleep while waiting for previous cores to finish their atomic access. As a scalable implementation of LRwait, we present Colibri, a distributed and scalable approach to managing LRwait reservations. Through extensive benchmarking on an open-source RISC-V platform with 256 cores, we demonstrate that Colibri outperforms current synchronization approaches for various concurrent algorithms with high and low contention regarding throughput, fairness, and energy efficiency. With an area overhead of only 6%, Colibri outperforms LRSC-based implementations by a factor of 6.5x in terms of throughput and 7.1x in terms of energy efficiency.
翻译:共享存储众核系统中的广泛轮询会导致争用、吞吐量下降和能效低下。锁实现和通用原子操作(加载保留/条件存储,LRSC)均因序列化和重试机制引发轮询问题。为缓解此开销,我们提出LRwait与SCwait这一同步对,通过允许竞争核心在等待先前核心完成原子访问时进入休眠状态来消除轮询。作为LRwait的可扩展实现,我们提出Colibri——一种管理LRwait预留的分布式可扩展方法。通过在256核开源RISC-V平台上的广泛基准测试,我们证明Colibri在高争用与低争用场景下,针对各类并发算法的吞吐量、公平性和能效均优于现有同步方案。Colibri的面积开销仅为6%,其吞吐量较基于LRSC的实现提升6.5倍,能效提升7.1倍。