Leveraging the SIMD capability of modern CPU architectures is mandatory to take full benefit of their increasing performance. To exploit this feature, binary executables must be explicitly vectorized by the developers or an automatic vectorization tool. This why the compilation research community has created several strategies to transform a scalar code into a vectorized implementation. However, the majority of the approaches focus on regular algorithms, such as affine loops, that can be vectorized with few data transformations. In this paper, we present a new approach that allow automatically vectorizing scalar codes with chaotic data accesses as long as their operations can be statically inferred. We describe how our method transforms a graph of scalar instructions into a vectorized one using different heuristics with the aim of reducing the number or cost of the instructions. Finally, we demonstrate the interest of our approach on various computational kernels using Intel AVX-512 and ARM SVE.
翻译:充分利用现代CPU架构的SIMD能力是发挥其日益增长性能的必然要求。为利用该特性,二进制可执行文件必须由开发者或自动向量化工具显式向量化。正因如此,编译研究社区已创建多种策略将标量代码转换为向量化实现。然而,现有方法多数聚焦于可经少量数据变换实现向量化的正则算法(如仿射循环)。本文提出一种新方法,允许对具有混沌数据访问的标量代码进行自动向量化,只要其操作可通过静态推断确定。我们阐述了如何通过不同启发式策略将标量指令图转换为向量化指令图,以降低指令数量或代价。最后,我们基于Intel AVX-512和ARM SVE架构,在多个计算内核上验证了本方法的有效性。