Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge

from arxiv, The original version was submitted to a journal and was later revised. The updated version was accepted in a journal and will be published soon. The 'Journal reference' will be updated as and when the information is available

We consider a resource-constrained Edge Device (ED), such as an IoT sensor or a microcontroller unit, embedded with a small-size ML model (S-ML) for a generic classification application and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. In order to get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We propose an online meta-learning framework that the ED can use to predict the correctness of the S-ML inference. In particular, we propose to use the maximum softmax value output by S-ML for a data sample and decide whether to offload it or not. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. We propose two different algorithms and prove sublinear regret bounds for them without any assumption on the smoothness of the loss function. We evaluate and benchmark the performance of the proposed algorithms for image classification application using four datasets, namely, Imagenette and Imagewoof, MNIST, and CIFAR-10.

翻译：我们考虑一个资源受限的边缘设备（ED），例如物联网传感器或微控制器单元，其上嵌入了用于通用分类应用的小型机器学习模型（S-ML），以及一个托管大型机器学习模型（L-ML）的边缘服务器（ES）。由于S-ML的推理精度低于L-ML，将所有数据样本卸载至ES可实现高推理精度，但这会破坏在ED上嵌入S-ML的初衷，并丧失本地推理在降低延迟、节省带宽和提升能效方面的优势。为了兼顾两类推理的优势（即ED推理与ES推理的益处），我们探索了分层推理（HI）的思想：仅当S-ML推理正确时才接受其结果，否则将数据样本卸载至L-ML进行推理。然而，由于ED无法获知S-ML推理的正确性，HI的理想实现方式并不可行。我们提出了一种在线元学习框架，ED可利用该框架预测S-ML推理的正确性。具体地，我们建议使用S-ML对数据样本输出的最大softmax值来决定是否卸载该样本。由此产生的在线学习问题实际上是一个具有连续专家空间的专家建议预测（PEA）问题。我们提出了两种不同算法，并在不对损失函数光滑性作任何假设的前提下，证明了它们具有次线性遗憾界。我们使用四个数据集（即Imagenette、Imagewoof、MNIST和CIFAR-10）对图像分类应用中所提算法的性能进行了评估与基准测试。