We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based methods often struggle due to three key limitations: (1) reliance on bulk-synchronous systems like TensorFlow, which under-utilize devices due to barrier synchronization; (2) lack of awareness of the scheduling mechanism of underlying systems when designing learning-based methods; and (3) exclusive dependence on reinforcement learning, ignoring the structure of effective heuristics designed by experts. In this paper, we propose Doppler, a three-stage framework for training dual-policy networks consisting of 1) a $\mathsf{SEL}$ policy for selecting operations and 2) a $\mathsf{PLC}$ policy for placing chosen operations on devices. Our experiments show that Doppler outperforms all baseline methods across tasks by reducing system execution time and additionally demonstrates sampling efficiency by reducing per-episode training time.
翻译:我们研究在工作守恒系统中将数据流图中的操作分配到设备以最小化执行时间的问题,重点关注复杂机器学习工作负载。先前的基于学习方法常因三个关键限制而效果不佳:(1)依赖如TensorFlow等批量同步系统,由于屏障同步导致设备利用率不足;(2)设计基于学习方法时缺乏对底层系统调度机制的认知;(3)完全依赖强化学习,忽略了专家设计的有效启发式结构。本文提出Doppler,一种训练双策略网络的三阶段框架,包含:1)用于选择操作的$\mathsf{SEL}$策略,以及2)用于将选中的操作放置到设备上的$\mathsf{PLC}$策略。实验表明,Doppler在各类任务中均优于所有基线方法,通过降低系统执行时间提升性能,同时通过减少每回合训练时间展现出采样效率优势。