Message aggregation is often used with a goal to reduce communication cost in HPC applications. The difference in the order of overhead of sending a message and cost of per byte transferred motivates the need for message aggregation, for several irregular fine-grained messaging applications like graph algorithms and parallel discrete event simulation (PDES). While message aggregation is frequently utilized in "MPI-everywhere" model, to coalesce messages between processes mapped to cores, such aggregation across threads in a process, say in MPI+X models or Charm++ SMP (Shared Memory Parallelism) mode, is often avoided. Within-process coalescing is likely to require synchronization across threads and lead to performance issues from contention. However, as a result, SMP-unaware aggregation mechanisms may not fully utilize aggregation opportunities available to applications in SMP mode. Additionally, while the benefit of message aggregation is often analyzed in terms of reducing the overhead, specifically the per message cost, we also analyze different schemes that can aid in reducing the message latency, ie. the time from when a message is sent to the time when it is received. Message latency can affect several applications like PDES with speculative execution where reducing message latency could result in fewer rollbacks. To address these challenges, in our work, we demonstrate the effectiveness of shared memory-aware message aggregation schemes for a range of proxy applications with respect to messaging overhead and latency.
翻译:在高性能计算应用中,消息聚合通常被用于降低通信开销。对于图算法和并行离散事件模拟等若干不规则细粒度消息传递应用而言,发送消息的开销与每字节传输成本在数量级上的差异,促使了消息聚合的需求。尽管消息聚合在“全MPI”模型中常被用于合并映射到不同核心的进程间消息,但在同一进程内的线程间进行此类聚合(例如在MPI+X模型或Charm++ SMP模式下)却往往被避免。进程内聚合可能需要线程间同步,并因竞争导致性能问题。然而,缺乏SMP感知的聚合机制可能无法充分利用SMP模式下应用可用的聚合机会。此外,尽管消息聚合的益处通常从降低开销(特别是每条消息的成本)角度分析,本文还研究了有助于降低消息延迟(即从消息发送到接收的时间间隔)的不同方案。消息延迟会影响PDES等采用推测执行的应用,降低延迟可能减少回滚操作。为应对这些挑战,本研究通过一系列代理应用,从消息开销和延迟两个维度,论证了共享内存感知消息聚合方案的有效性。