A cross-configuration benchmark is proposed to explore the capacities and limitations of AVX / NEON intrinsic functions in a generic context of development project, when a vectorisation strategy is required to optimise the code. The main aim is to guide developers to choose when using intrinsic functions, depending on the OS, architecture and/or available compiler. Intrinsic functions were observed highly efficient in conditional branching, with intrinsic version execution time reaching around 5% of plain code execution time. However, intrinsic functions were observed as unnecessary in many cases, as the compilers already well auto-vectorise the code.
翻译:本文提出了一种跨配置基准测试方案,旨在探究在需要采用向量化策略优化代码的通用开发项目背景下,AVX/NEON 内联函数的能力与局限。其主要目标在于指导开发者根据操作系统、架构及可用编译器等因素,判断何时应使用内联函数。实验观测发现内联函数在条件分支处理中具有显著效能,其执行时间可达原始代码执行时间的约5%。然而,研究亦表明多数情况下内联函数并非必需,因为现代编译器已能实现良好的代码自动向量化。