Leveraging vectorisation, the ability for a CPU to apply operations to multiple elements of data concurrently, is critical for high performance workloads. However, at the time of writing, commercially available physical RISC-V hardware that provides the RISC-V vector extension (RVV) only supports version 0.7.1, which is incompatible with the latest ratified version 1.0. The challenge is that upstream compiler toolchains, such as Clang, only target the ratified v1.0 and do not support the older v0.7.1. Because v1.0 is not compatible with v0.7.1, the only way to program vectorised code is to use a vendor-provided, older compiler. In this paper we introduce the rvv-rollback tool which translates assembly code generated by the compiler using vector extension v1.0 instructions to v0.7.1. We utilise this tool to compare vectorisation performance of the vendor-provided GNU 8.4 compiler (supports v0.7.1) against LLVM 15.0 (supports only v1.0), where we found that the LLVM compiler is capable of auto-vectorising more computational kernels, and delivers greater performance than GNU in most, but not all, cases. We also tested LLVM vectorisation with vector length agnostic and specific settings, and observed cases with significant difference in performance.
翻译:利用向量化技术——即CPU同时对多个数据元素执行操作的能力——对于高性能工作负载至关重要。然而,截至撰写本文时,提供RISC-V向量扩展(RVV)的商用物理RISC-V硬件仅支持版本0.7.1,该版本与最新批准的版本1.0不兼容。挑战在于上游编译器工具链(如Clang)仅针对已批准的v1.0,并不支持较旧的v0.7.1。由于v1.0与v0.7.1不兼容,编程向量化代码的唯一途径是使用供应商提供的旧版编译器。本文介绍了rvv-rollback工具,该工具将编译器使用向量扩展v1.0指令生成的汇编代码转换为v0.7.1。我们利用该工具比较了供应商提供的GNU 8.4编译器(支持v0.7.1)与LLVM 15.0(仅支持v1.0)的向量化性能,发现LLVM编译器能够自动向量化更多的计算核心,并且在大多数(但非全部)情况下性能优于GNU。我们还测试了向量长度无关和特定设置下的LLVM向量化,观察到性能存在显著差异的情况。