Knowledge Distillation on Spatial-Temporal Graph Convolutional Network for Traffic Prediction

Efficient real-time traffic prediction is crucial for reducing transportation time. To predict traffic conditions, we employ a spatio-temporal graph neural network (ST-GNN) to model our real-time traffic data as temporal graphs. Despite its capabilities, it often encounters challenges in delivering efficient real-time predictions for real-world traffic data. Recognizing the significance of timely prediction due to the dynamic nature of real-time data, we employ knowledge distillation (KD) as a solution to enhance the execution time of ST-GNNs for traffic prediction. In this paper, We introduce a cost function designed to train a network with fewer parameters (the student) using distilled data from a complex network (the teacher) while maintaining its accuracy close to that of the teacher. We use knowledge distillation, incorporating spatial-temporal correlations from the teacher network to enable the student to learn the complex patterns perceived by the teacher. However, a challenge arises in determining the student network architecture rather than considering it inadvertently. To address this challenge, we propose an algorithm that utilizes the cost function to calculate pruning scores, addressing small network architecture search issues, and jointly fine-tunes the network resulting from each pruning stage using KD. Ultimately, we evaluate our proposed ideas on two real-world datasets, PeMSD7 and PeMSD8. The results indicate that our method can maintain the student's accuracy close to that of the teacher, even with the retention of only $3\%$ of network parameters.

翻译：高效的实时交通预测对于减少出行时间至关重要。为预测交通状况，我们采用空时图神经网络（ST-GNN）将实时交通数据建模为时间序列图。尽管该网络具备处理能力，但在为真实交通数据提供高效实时预测时仍常面临挑战。考虑到实时数据的动态特性对及时预测的要求，我们引入知识蒸馏（KD）作为解决方案，以提升ST-GNN在交通预测中的执行效率。本文提出一种成本函数，通过从复杂网络（教师网络）中提取蒸馏数据，训练一个参数更少的网络（学生网络），同时保持其精度接近教师网络。我们利用知识蒸馏整合教师网络中的空时相关性，使学生网络能够学习教师网络感知的复杂模式。然而，如何合理设计学生网络架构而非随意确定成为挑战。针对此问题，我们提出一种算法：利用成本函数计算剪枝分数，解决小型网络架构搜索问题，并通过KD联合微调每次剪枝阶段产生的网络。最终，我们在两个真实数据集PeMSD7和PeMSD8上评估所提方法。结果表明，即使仅保留$3\%$的网络参数，我们的方法仍能使学生网络的精度保持接近教师网络。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日