Vectorization is a powerful optimization technique that significantly boosts the performance of high performance computing applications operating on large data arrays. Despite decades of research on auto-vectorization, compilers frequently miss opportunities to vectorize code. On the other hand, writing vectorized code manually using compiler intrinsics is still a complex, error-prone task that demands deep knowledge of specific architecture and compilers. In this paper, we evaluate the potential of large-language models (LLMs) to generate vectorized (Single Instruction Multiple Data) code from scalar programs that process individual array elements. We propose a novel finite-state machine multi-agents based approach that harnesses LLMs and test-based feedback to generate vectorized code. Our findings indicate that LLMs are capable of producing high performance vectorized code with run-time speedup ranging from 1.1x to 9.4x as compared to the state-of-the-art compilers such as Intel Compiler, GCC, and Clang. To verify the correctness of vectorized code, we use Alive2, a leading bounded translation validation tool for LLVM IR. We describe a few domain-specific techniques to improve the scalability of Alive2 on our benchmark dataset. Overall, our approach is able to verify 38.2% of vectorizations as correct on the TSVC benchmark dataset.
翻译:向量化是一种强大的优化技术,可显著提升处理大型数据数组的高性能计算应用的运行效率。尽管自动向量化研究已开展数十年,编译器仍时常错过代码向量化的机会。另一方面,使用编译器内联函数手动编写向量化代码仍是一项复杂且易出错的任务,需要深入掌握特定架构和编译器的专业知识。本文评估了大语言模型从处理单个数组元素的标量程序生成向量化代码的潜力。我们提出了一种新颖的基于有限状态机多智能体的方法,利用大语言模型和基于测试的反馈生成向量化代码。研究结果表明,与英特尔编译器、GCC和Clang等最先进的编译器相比,大语言模型能够生成高性能向量化代码,运行时加速比可达1.1倍至9.4倍。为验证向量化代码的正确性,我们使用Alive2(一种针对LLVM IR的领先有界翻译验证工具)进行验证。我们描述了几种特定领域技术,以提高Alive2在我们基准数据集上的可扩展性。总体而言,我们的方法能够在TSVC基准数据集上验证38.2%的向量化结果为正确。