A structure-aware framework for learning device placements on computation graphs

Shukai Duan,Heng Ping,Nikos Kanakaris,Xiongye Xiao,Peiyu Zhang,Panagiotis Kyriakis,Nesreen K. Ahmed,Guixiang Ma,Mihai Capota,Shahin Nazarian,Theodore L. Willke,Paul Bogdan

Existing approaches for device placement ignore the topological features of computation graphs and rely mostly on heuristic methods for graph partitioning. At the same time, they either follow a grouper-placer or an encoder-placer architecture, which requires understanding the interaction structure between code operations. To bridge the gap between encoder-placer and grouper-placer techniques, we propose a novel framework for the task of device placement, relying on smaller computation graphs extracted from the OpenVINO toolkit using reinforcement learning. The framework consists of five steps, including graph coarsening, node representation learning and policy optimization. It facilitates end-to-end training and takes into consideration the directed and acyclic nature of the computation graphs. We also propose a model variant, inspired by graph parsing networks and complex network analysis, enabling graph representation learning and personalized graph partitioning jointly, using an unspecified number of groups. To train the entire framework, we utilize reinforcement learning techniques by employing the execution time of the suggested device placements to formulate the reward. We demonstrate the flexibility and effectiveness of our approach through multiple experiments with three benchmark models, namely Inception-V3, ResNet, and BERT. The robustness of the proposed framework is also highlighted through an ablation study. The suggested placements improve the inference speed for the benchmark models by up to $58.2\%$ over CPU execution and by up to $60.24\%$ compared to other commonly used baselines.

翻译：现有设备放置方法忽视了计算图的拓扑特征，主要依赖启发式图划分方法。同时，这些方法要么采用分组器-放置器架构，要么采用编码器-放置器架构，这都需要理解代码操作间的交互结构。为弥合编码器-放置器与分组器-放置器技术间的鸿沟，我们提出一种基于强化学习从OpenVINO工具包提取小型计算图的新型设备放置框架。该框架包含图粗化、节点表示学习和策略优化五个步骤，支持端到端训练，并充分考虑计算图的有向无环特性。受图解析网络与复杂网络分析的启发，我们还提出一种模型变体，能够通过未指定数量的分组实现图表示学习与个性化图划分的联合训练。为训练整个框架，我们采用强化学习技术，以建议设备放置方案的执行时间构建奖励函数。通过对Inception-V3、ResNet和BERT三个基准模型的多组实验，我们验证了所提方法的灵活性与有效性。消融实验进一步凸显了框架的鲁棒性。相比CPU执行，所提放置方案将基准模型的推理速度最高提升$58.2\%$；相较于其他常用基线方法，最高可提升$60.24\%$。