Although the convolutional neural network (CNN) has achieved excellent performance in vision tasks by extracting the intra-sample representation, it will take a higher training expense because of stacking numerous convolutional layers. Recently, as the bilinear models, graph neural networks (GNN) have succeeded in exploring the underlying topological relationship among the graph data with a few graph neural layers. Unfortunately, it cannot be directly utilized on non-graph data due to the lack of graph structure and has high inference latency on large-scale scenarios. Inspired by these complementary strengths and weaknesses, \textit{we discuss a natural question, how to bridge these two heterogeneous networks?} In this paper, we propose a novel CNN2GNN framework to unify CNN and GNN together via distillation. Firstly, to break the limitations of GNN, a differentiable sparse graph learning module is designed as the head of networks to dynamically learn the graph for inductive learning. Then, a response-based distillation is introduced to transfer the knowledge from CNN to GNN and bridge these two heterogeneous networks. Notably, due to extracting the intra-sample representation of a single instance and the topological relationship among the datasets simultaneously, the performance of distilled ``boosted'' two-layer GNN on Mini-ImageNet is much higher than CNN containing dozens of layers such as ResNet152.
翻译:尽管卷积神经网络(CNN)通过提取样本内表征在视觉任务中取得了卓越性能,但由于堆叠大量卷积层,其训练成本较高。近年来,图神经网络(GNN)作为双线性模型,凭借少量图神经层成功探索了图数据中潜在的拓扑关系。然而,由于缺乏图结构,GNN无法直接应用于非图数据,且在大规模场景下存在较高的推理延迟。受这些互补优缺点的启发,\textit{我们探讨了一个自然问题:如何桥接这两种异构网络?}本文提出一种新颖的CNN2GNN框架,通过知识蒸馏统一CNN与GNN。首先,为突破GNN的局限性,设计了一个可微分的稀疏图学习模块作为网络头部,动态学习图结构以支持归纳学习。其次,引入基于响应的知识蒸馏,将CNN的知识迁移至GNN,从而桥接这两种异构网络。值得注意的是,由于同时提取单个实例的样本内表征与数据集间的拓扑关系,经蒸馏的“增强型”两层GNN在Mini-ImageNet上的性能远超包含数十个卷积层的ResNet152等CNN模型。