The key to device-edge co-inference paradigm is to partition models into computation-friendly and computation-intensive parts across the device and the edge, respectively. However, for Graph Neural Networks (GNNs), we find that simply partitioning without altering their structures can hardly achieve the full potential of the co-inference paradigm due to various computational-communication overheads of GNN operations over heterogeneous devices. We present GCoDE, the first automatic framework for GNN that innovatively Co-designs the architecture search and the mapping of each operation on Device-Edge hierarchies. GCoDE abstracts the device communication process into an explicit operation and fuses the search of architecture and the operations mapping in a unified space for joint-optimization. Also, the performance-awareness approach, utilized in the constraint-based search process of GCoDE, enables effective evaluation of architecture efficiency in diverse heterogeneous systems. We implement the co-inference engine and runtime dispatcher in GCoDE to enhance the deployment efficiency. Experimental results show that GCoDE can achieve up to $44.9\times$ speedup and $98.2\%$ energy reduction compared to existing approaches across various applications and system configurations.
翻译:设备-边缘协同推理范式的核心在于将模型划分为计算友好型和计算密集型两部分,分别部署于设备端和边缘端。然而,针对图神经网络(GNNs),我们发现若仅简单划分而不改变其结构,由于GNN操作在异构设备上产生的不同计算-通信开销,该协同推理范式难以充分发挥潜力。为此,我们提出GCoDE——首个针对GNN的自动化框架,创新性地将架构搜索与各操作在设备-边缘层级上的映射进行协同设计。GCoDE将设备通信过程抽象为显式操作,并在统一空间中融合架构搜索与操作映射以实现联合优化。此外,GCoDE在基于约束的搜索过程中采用性能感知方法,能够有效评估架构在不同异构系统中的效率。我们实现了GCoDE中的协同推理引擎与运行时调度器以提升部署效率。实验结果表明,相比现有方法,GCoDE在各种应用和系统配置下可实现高达44.9倍的加速比和98.2%的能耗降低。