Online Algorithms for Hierarchical Inference in Deep Learning applications at the Edge

from arxiv, This work will be appearing in a journal soon and the 'Journal reference' will be updated as and when the information is available. The submission contains 22 pages, 7 figures including subfigures, 2 tables and 2 algorithms

We consider a resource-constrained Edge Device (ED) embedded with a small-size ML model (S-ML) for a generic classification application, and an Edge Server (ES) that hosts a large-size ML model (L-ML). Since the inference accuracy of S-ML is lower than that of the L-ML, offloading all the data samples to the ES results in high inference accuracy, but it defeats the purpose of embedding S-ML on the ED and deprives the benefits of reduced latency, bandwidth savings, and energy efficiency of doing local inference. To get the best out of both worlds, i.e., the benefits of doing inference on the ED and the benefits of doing inference on ES, we explore the idea of Hierarchical Inference (HI), wherein S-ML inference is only accepted when it is correct, otherwise the data sample is offloaded for L-ML inference. However, the ideal implementation of HI is infeasible as the correctness of the S-ML inference is not known to the ED. We thus propose an online meta-learning framework to predict the correctness of the S-ML inference. The resulting online learning problem turns out to be a Prediction with Expert Advice (PEA) problem with continuous expert space. We consider the full feedback scenario, where the ED receives feedback on the correctness of the S-ML once it accepts the inference, and the no-local feedback scenario, where the ED does not receive the ground truth for the classification, and propose the HIL-F and HIL-N algorithms and prove a regret bound that is sublinear with the number of data samples. We evaluate and benchmark the performance of the proposed algorithms for image classification applications using four datasets, namely, Imagenette, Imagewoof, MNIST, and CIFAR-10.

翻译：我们考虑一个资源受限的边缘设备（ED），该设备嵌入了用于通用分类任务的小型机器学习模型（S-ML），同时边缘服务器（ES）托管了大型机器学习模型（L-ML）。由于S-ML的推理精度低于L-ML，将所有数据样本卸载至ES可获得高推理精度，但这违背了在ED上嵌入S-ML的初衷，并牺牲了本地推理在降低延迟、节省带宽和能效方面的优势。为兼顾两种场景的优势（即在ED和执行推理与在ES上执行推理的益处），我们探索了层次化推理（HI）的思想：仅当S-ML推理正确时才接受其结果，否则将数据样本卸载至L-ML进行推理。然而，HI的理想实现不可行，因为ED无法获知S-ML推理的正确性。为此，我们提出一种在线元学习框架来预测S-ML推理的正确性。由此产生的在线学习问题被转化为一个具有连续专家空间的“专家建议预测”（PEA）问题。我们考虑了两种场景：完全反馈场景（ED在接受S-ML推理后收到其正确性的反馈）和无本地反馈场景（ED未收到分类的真实标签），并分别提出HIL-F和HIL-N算法，证明了其遗憾界随数据样本数量呈亚线性增长。我们使用四个数据集（Imagenette、Imagewoof、MNIST和CIFAR-10）对图像分类应用中提出的算法性能进行了评估与基准测试。