In exascale-oriented GPU clusters, rigid-topology jobs leave behind a fragmented post-landing ecology in which long-resident workloads and highly transient tasks compete for unstable residual capacity. Existing centralized, hierarchical, and local-first decentralized schedulers incur growing coordination and retry-amplification costs in this regime and typically stop their explicit responsibility at execution start, leaving runtime survival to indiscriminate host-level OOM heuristics. We present Laminar, a decentralized probe-first, execute-later scheduling paradigm that keeps hot-path control-plane work near $\mathcal{O}(1)$ through Zone-level probabilistic flow splitting, bounded in-Zone probing by persistent lightweight agents, and node-local arbitration. Laminar further introduces Airlock, a bounded node-local runtime-survival layer that converts severe memory pressure into an ordered policy of suspension, in-situ recovery, bounded secondary re-addressing, or reclamation. By enforcing priority-ordered survival under pressure, Laminar enables lifecycle-aware scheduling that preserves high-value long-resident work and operates closer to physical saturation without relying on protocol-level overcommitment.
翻译:摘要:在面向百亿亿次计算的GPU集群中,刚性拓扑任务在落地后留下了碎片化的生态,其中长期驻留的工作负载与高度瞬态的任务争夺不稳定的剩余容量。现有的集中式、层次化及本地优先的分布式调度器在此模式下面临日益增长的协调与重试放大开销,且通常仅在任务执行开始时承担明确职责,将运行时存活问题交由无差别的主机级OOM启发式机制处理。我们提出Laminar,一种分布式探针优先、延迟执行的调度范式,通过区域级概率流量分割、有界区域内探针(由持久化轻量级代理执行)及节点本地仲裁,将热路径控制平面工作维持在接近$\mathcal{O}(1)$的复杂度。Laminar进一步引入Airlock,一种有界的节点本地运行时存活层,将严重内存压力转化为有序策略:挂起、原位恢复、有界二次重寻址或回收。通过在压力下强制执行优先级排序的存活机制,Laminar实现了生命周期感知调度,在无需依赖协议级过度提交的情况下,保留高价值长期驻留工作并使系统更接近物理饱和状态运行。