Graphs have become a key tool when modeling and solving problems in different areas. The Floyd-Warshall (FW) algorithm computes the shortest path between all pairs of vertices in a graph and is employed in areas like communication networking, traffic routing, bioinformatics, among others. However, FW is computationally and spatially expensive since it requires O(n^3) operations and O(n^2) memory space. As the graph gets larger, parallel computing becomes necessary to provide a solution in an acceptable time range. In this paper, we studied a FW code developed for Xeon Phi KNL processors and adapted it to run on any Intel x86 processors, losing the specificity of the former. To do so, we verified one by one the optimizations proposed by the original code, making adjustments to the base code where necessary, and analyzing its performance on two Intel servers under different test scenarios. In addition, a new optimization was proposed to increase the concurrency degree of the parallel algorithm, which was implemented using two different synchronization mechanisms. The experimental results show that all optimizations were beneficial on the two x86 platforms selected. Last, the new optimization proposal improved performance by up to 23%.
翻译:图已成为不同领域建模和解决问题的关键工具。Floyd-Warshall(FW)算法计算图中所有顶点对之间的最短路径,被广泛应用于通信网络、交通路由、生物信息学等领域。然而,FW算法在计算和空间上代价高昂,需要O(n^3)次运算和O(n^2)内存空间。随着图规模增大,并行计算成为在可接受时间范围内提供解决方案的必要手段。本文研究了为至强融核KNL处理器开发的FW代码,并将其适配至可在任意Intel x86处理器上运行,消除了对原特定硬件的依赖。为此,我们逐项验证了原始代码提出的优化技术,在基础代码中进行了必要调整,并在两种Intel服务器上通过不同测试场景分析其性能。此外,本文提出了一种新的优化方案以提升并行算法的并发度,该方案采用两种不同的同步机制实现。实验结果表明,所有优化措施在选定的两个x86平台上均产生积极效果。最终,新优化方案将性能提升最高达23%。