Vector architectures are gaining traction for highly efficient processing of data-parallel workloads, driven by all major ISAs (RISC-V, Arm, Intel), and boosted by landmark chips, like the Arm SVE-based Fujitsu A64FX, powering the TOP500 leader Fugaku. The RISC-V V extension has recently reached 1.0-Frozen status. Here, we present its first open-source implementation, discuss the new specification's impact on the micro-architecture of a lane-based design, and provide insights on performance-oriented design of coupled scalar-vector processors. Our system achieves comparable/better PPA than state-of-the-art vector engines that implement older RVV versions: 15% better area, 6% improved throughput, and FPU utilization >98.5% on crucial kernels.
翻译:向量架构因其对数据并行工作负载的高效处理能力而日益受到关注,这受到所有主要指令集架构(RISC-V、Arm、Intel)的推动,并得到里程碑式芯片的助力,例如基于Arm SVE、为TOP500榜首Fugaku超级计算机提供动力的富士通A64FX。RISC-V V扩展近期已达到1.0版本冻结状态。本文首次介绍了其开源实现,讨论了新规范对基于通道的微架构设计的影响,并深入探讨了面向性能的标量-向量耦合处理器设计。我们的系统在实现较旧RVV版本的先进向量引擎上,实现了相当或更优的功耗-性能-面积指标:面积减少15%,吞吐量提升6%,且在关键内核上的浮点处理单元利用率超过98.5%。