Influence Maximization (IM) is vital in viral marketing and biological network analysis for identifying key influencers. Given its NP-hard nature, approximate solutions are employed. This paper addresses scalability challenges in scale-out shared memory system by focusing on the state-of-the-art Influence Maximization via Martingales (IMM) benchmark. To enhance the work efficiency of the current IMM implementation, we propose EFFICIENTIMM with key strategies, including new parallelization scheme, NUMA-aware memory usage, dynamic load balancing and fine-grained adaptive data structures. Benchmarking on a 128-core CPU system with 8 NUMA nodes, EFFICIENTIMM demonstrated significant performance improvements, achieving an average 5.9x speedup over Ripples across 8 diverse SNAP datasets, when compared to the best execution times of the original Ripples framework. Additionally, on the Youtube graph, EFFICIENTIMM demonstrates a better memory access pattern with 357.4x reduction in L1+L2 cache misses as compared to Ripples.
翻译:影响力最大化在病毒式营销和生物网络分析中对于识别关键影响者至关重要。鉴于其NP难特性,通常采用近似解法。本文针对横向扩展共享内存系统中的可扩展性挑战,聚焦于当前最先进的基于鞅的影响力最大化基准算法。为提升现有IMM实现的工作效率,我们提出了EFFICIENTIMM算法,其核心策略包括:新型并行化方案、NUMA感知内存使用、动态负载均衡以及细粒度自适应数据结构。在配备8个NUMA节点的128核CPU系统上进行基准测试,EFFICIENTIMM展现出显著的性能提升——在8个不同的SNAP数据集上,相较于原始Ripples框架的最佳执行时间,平均实现了5.9倍的加速比。此外,在Youtube图数据上,EFFICIENTIMM表现出更优的内存访问模式,其L1+L2缓存未命中次数较Ripples减少了357.4倍。