Task-based runtime systems provide flexible load balancing and portability for parallel scientific applications, but their strong scaling is highly sensitive to task granularity. As parallelism increases, scheduling overhead may transition from negligible to dominant, leading to rapid drops in performance for some algorithms, while remaining negligible for others. Although such effects are widely observed empirically, there is a general lack of understanding how algorithmic structure impacts whether dynamic scheduling is always beneficial. In this work, we introduce a granularity characterization framework that directly links scheduling overhead growth to task-graph dependency topology. We show that dependency structure, rather than problem size alone, governs how overhead scales with parallelism. Based on this observation, we characterize execution behavior using a simple granularity measure that indicates when scheduling overhead can be amortized by parallel computation and when scheduling overhead dominates performance. Through experimental evaluation on representative parallel workloads with diverse dependency patterns, we demonstrate that the proposed characterization explains both gradual and abrupt strong-scaling breakdowns observed in practice. We further show that overhead models derived from dependency topology accurately predict strong-scaling limits and enable a practical runtime decision rule for selecting dynamic or static execution without requiring exhaustive strong-scaling studies or extensive offline tuning.
翻译:基于任务的运行时系统为并行科学计算应用提供了灵活的负载均衡与可移植性,但其强可扩展性对任务粒度高度敏感。随着并行度提升,调度开销可能从可忽略转变为主导因素,导致某些算法的性能急剧下降,而对其他算法则仍可忽略不计。尽管此类效应在实证中被广泛观察到,但学界普遍缺乏对算法结构如何影响动态调度是否始终有益的理解。本研究提出一种粒度特性表征框架,将调度开销增长直接关联至任务图依赖拓扑结构。我们证明依赖结构(而非单纯问题规模)主导着开销随并行度的扩展规律。基于此观察,我们采用一种简易的粒度度量来表征执行行为,该度量能够指示调度开销何时可被并行计算分摊,何时将主导整体性能。通过对具有多样化依赖模式的典型并行负载进行实验评估,我们证明所提出的表征方法能够解释实践中观察到的渐进式与突变式强可扩展性崩溃现象。进一步研究表明,基于依赖拓扑推导的开销模型可准确预测强可扩展性极限,并为选择动态或静态执行提供实用的运行时决策规则,无需进行详尽的强可扩展性研究或大量离线调优。