Graphs have become a key tool when modeling and solving problems in different areas. The Floyd-Warshall (FW) algorithm computes the shortest path between all pairs of vertices in a graph and is employed in areas like communication networking, traffic routing, bioinformatics, among others. However, FW is computationally and spatially expensive since it requires O(n^3) operations and O(n^2) memory space. As the graph gets larger, parallel computing becomes necessary to provide a solution in an acceptable time range. In this paper, we studied a FW code developed for Xeon Phi KNL processors and adapted it to run on any Intel x86 processors, losing the specificity of the former. To do so, we verified one by one the optimizations proposed by the original code, making adjustments to the base code where necessary, and analyzing its performance on two Intel servers under different test scenarios. In addition, a new optimization was proposed to increase the concurrency degree of the parallel algorithm, which was implemented using two different synchronization mechanisms. The experimental results show that all optimizations were beneficial on the two x86 platforms selected. Last, the new optimization proposal improved performance by up to 23%.
翻译:图已成为不同领域问题建模与求解中的关键工具。Floyd-Warshall(FW)算法用于计算图中所有顶点对之间的最短路径,广泛应用于通信网络、交通路由、生物信息学等领域。然而,FW算法在计算和空间开销上代价高昂,需要O(n^3)次操作和O(n^2)内存空间。随着图规模增大,并行计算成为在可接受时间范围内提供解决方案的必要手段。本文研究了为Xeon Phi KNL处理器开发的FW代码,并将其适配到任意Intel x86处理器上运行,消除了前者的特异性。为此,我们逐一验证了原始代码提出的优化方案,对基础代码进行必要调整,并在两台Intel服务器上分析其在不同测试场景下的性能。此外,我们提出了一种新的优化方案以提高并行算法的并发度,该方案采用两种不同的同步机制实现。实验结果表明,所有优化在所选x86平台上均表现有益。最后,新优化方案将性能提升了高达23%。