Early-bird communication is a communication/computation overlap technique that combines fine-grained communication with partitioned communication to improve application run-time. Communication is divided among the compute threads such that each individual thread can initiate transmission of its portion of the data as soon as it is complete rather than waiting for all of the threads. However, the benefit of early-bird communication depends on the completion timing of the individual threads. In this paper, we measure and evaluate the potential overlap, the idle time each thread experiences between finishing their computation and the final thread finishing. These measurements help us understand whether a given application could benefit from early-bird communication. We present our technique for gathering this data and evaluate data collected from three proxy applications: MiniFE, MiniMD, and MiniQMC. To characterize the behavior of these workloads, we study the thread timings at both a macro level, i.e., across all threads across all runs of an application, and a micro level, i.e., within a single process of a single run. We observe that these applications exhibit significantly different behavior. While MiniFE and MiniQMC appear to be well-suited for early-bird communication because of their wider thread distribution and more frequent laggard threads, the behavior of MiniMD may limit its ability to leverage early-bird communication.
翻译:早鸟通信是一种通信/计算重叠技术,它将细粒度通信与分区通信相结合,以改进应用程序的运行时间。通信在线程之间进行划分,使得每个单独的线程能够在完成其数据部分后立即开始传输,而无需等待所有线程。然而,早鸟通信的益处取决于各个线程的完成时序。在本文中,我们测量并评估了潜在的重叠程度,即每个线程在完成其计算与最终线程完成之间经历的空闲时间。这些测量有助于我们了解给定应用程序是否能从早鸟通信中获益。我们介绍了收集这些数据的技术,并评估了从三个代理应用程序(MiniFE、MiniMD和MiniQMC)收集的数据。为了表征这些工作负载的行为,我们从宏观层面(即应用程序所有运行中所有线程的时序)和微观层面(即单次运行的单个进程内的时序)研究了线程时序。我们观察到这些应用程序表现出显著不同的行为。尽管MiniFE和MiniQMC因其更宽的线程分布和更频繁的滞后线程而似乎非常适合早鸟通信,但MiniMD的行为可能限制了其利用早鸟通信的能力。