Many libraries, such as OpenCV, FFmpeg, XNNPACK, and Eigen, utilize Arm or x86 SIMD Intrinsics to optimize programs for performance. With the emergence of RISC-V Vector Extensions (RVV), there is a need to migrate these performance legacy codes for RVV. Currently, the migration of NEON code to RVV code requires manual rewriting, which is a time-consuming and error-prone process. In this work, we use the open source tool, "SIMD Everywhere" (SIMDe), to automate the migration. Our primary task is to enhance SIMDe to enable the conversion of ARM NEON Intrinsics types and functions to their corresponding RVV Intrinsics types and functions. For type conversion, we devise strategies to convert Neon Intrinsics types to RVV Intrinsics by considering the vector length agnostic (vla) architectures. With function conversions, we analyze commonly used conversion methods in SIMDe and develop customized conversions for each function based on the results of RVV code generations. In our experiments with Google XNNPACK library, our enhanced SIMDe achieves speedup ranging from 1.51x to 5.13x compared to the original SIMDe, which does not utilize customized RVV implementations for the conversions.
翻译:许多库(如OpenCV、FFmpeg、XNNPACK和Eigen)利用Arm或x86 SIMD内建函数来优化程序性能。随着RISC-V向量扩展(RVV)的出现,需要将这些历史性性能代码迁移至RVV。当前,将NEON代码迁移至RVV代码需要手动重写,这一过程既耗时又易出错。在本工作中,我们使用开源工具"SIMD全平台"(SIMDe)来自动化迁移过程。我们的主要任务是增强SIMDe,使其能够将ARM NEON内建函数类型和函数转换为对应的RVV内建函数类型和函数。针对类型转换,我们设计了考虑向量长度无关(VLA)架构的NEON内建类型到RVV内建类型的转换策略。在函数转换方面,我们分析了SIMDe中常用的转换方法,并根据RVV代码生成结果,为每个函数开发了定制化转换方案。在使用谷歌XNNPACK库进行的实验中,与未使用定制化RVV实现的原始SIMDe相比,增强后的SIMDe实现了1.51倍至5.13倍的加速比。