Deep Neural Networks (DNNs) have drawn attention because of their outstanding performance on various tasks. However, deploying full-fledged DNNs in resource-constrained devices (edge, mobile, IoT) is difficult due to their large size. To overcome the issue, various approaches are considered, like offloading part of the computation to the cloud for final inference (split computing) or performing the inference at an intermediary layer without passing through all layers (early exits). In this work, we propose combining both approaches by using early exits in split computing. In our approach, we decide up to what depth of DNNs computation to perform on the device (splitting layer) and whether a sample can exit from this layer or need to be offloaded. The decisions are based on a weighted combination of accuracy, computational, and communication costs. We develop an algorithm named SplitEE to learn an optimal policy. Since pre-trained DNNs are often deployed in new domains where the ground truths may be unavailable and samples arrive in a streaming fashion, SplitEE works in an online and unsupervised setup. We extensively perform experiments on five different datasets. SplitEE achieves a significant cost reduction ($>50\%$) with a slight drop in accuracy ($<2\%$) as compared to the case when all samples are inferred at the final layer. The anonymized source code is available at \url{https://anonymous.4open.science/r/SplitEE_M-B989/README.md}.
翻译:深度神经网络(DNN)因其在各类任务中的卓越表现而备受关注。然而,在资源受限设备(如边缘设备、移动设备、物联网设备)上部署完整的DNN模型因体积庞大而困难重重。为解决此问题,研究者提出了多种方案,例如将部分计算任务卸载至云端完成最终推理(分割计算),或通过中间层直接输出结果而不遍历所有层(早期退出)。本文提出将两种方法结合——在分割计算中引入早期退出策略。我们依据准确率、计算成本与通信成本的加权组合,决策在设备端完成DNN计算的深度(分割层),并判定该样本能否从该层直接退出或需卸载至云端。我们开发了名为SplitEE的算法以学习最优策略。鉴于预训练DNN常部署于新领域,此时真实标签可能缺失且样本以流式方式到达,SplitEE在在线无监督环境下工作。我们在五个不同数据集上进行了大量实验。相比所有样本均在最终层推理的基准方案,SplitEE在准确率仅小幅下降(<2%)的情况下实现了显著的成本降低(>50%)。匿名源码发布于\url{https://anonymous.4open.science/r/SplitEE_M-B989/README.md}。