Efficient scheduling of directed acyclic graphs (DAGs) is a core problem in large-scale data-intensive computing systems, where query plans, data-processing workloads, and computation graphs consist of dependent tasks competing for limited heterogeneous resource pools. In practice, achieving high-performance execution requires schedulers to adapt across environments with varying resource pools and task types, while generating schedules under tight runtime budgets. We propose WeCAN, an end-to-end reinforcement learning framework for heterogeneous DAG scheduling that addresses task-pool compatibility coefficients and generation-induced optimality gaps. It adopts a two-stage single-pass design: a single forward pass produces task-pool scores and global parameters, followed by a generation map that constructs schedules without repeated network calls. Its weighted cross-attention encoder models task-pool interactions gated by compatibility coefficients, and is size-agnostic to environment fluctuations. Moreover, widely used list-scheduling maps can incur generation-induced optimality gaps from restricted reachability. We introduce an order-space analysis that characterizes the reachable set of generation maps via feasible schedule orders, explains the mechanism behind generation-induced gaps, and yields sufficient conditions for gap elimination. Guided by these conditions, we design a skip-extended realization with an analytically parameterized decreasing skip rule, which enlarges the reachable order set while preserving single-pass efficiency. Experiments on real-world TPC-H query DAGs, resource-intensive workload datasets, and ML-compiler computation graphs demonstrate improved makespan over strong baselines, with inference time comparable to classical heuristics and faster than multi-round neural schedulers.
翻译:高效调度有向无环图是大规模数据密集型计算系统的核心问题,其中查询计划、数据处理工作负载和计算图由相互依赖的任务组成,这些任务竞争有限的异构资源池。在实践中,实现高性能执行需要调度器能够适应资源池和任务类型不断变化的环境,同时在严格的运行时预算内生成调度方案。我们提出WeCAN,一种用于异构有向无环图调度的端到端强化学习框架,该框架解决了任务池兼容性系数和生成导致的最优性间隙问题。它采用两阶段单次前向传播设计:单次前向传播产生任务池得分和全局参数,随后通过生成映射构建调度方案,无需重复网络调用。其加权交叉注意力编码器通过兼容性系数门控机制对任务池交互进行建模,且对环境波动具有规模无关性。此外,广泛使用的列表调度映射可能因受限可达性而产生生成导致的最优性间隙。我们提出一种顺序空间分析方法,通过可行调度顺序刻画生成映射的可达集,解释生成间隙的产生机制,并得出间隙消除的充分条件。在这些条件的指导下,我们设计了一种跳跃扩展实现方案,该方案采用解析参数化的递减跳跃规则,在保持单次前向传播效率的同时扩大了可达顺序集。在真实TPC-H查询有向无环图、资源密集型工作负载数据集和机器学习编译器计算图上的实验表明,与强基线相比,该方法在获得更优完工时间的同时,推理时间与经典启发式方法相当,且快于多轮神经调度器。