QCore: Data-Efficient, On-Device Continual Calibration for Quantized Models -- Extended Version

We are witnessing an increasing availability of streaming data that may contain valuable information on the underlying processes. It is thus attractive to be able to deploy machine learning models on edge devices near sensors such that decisions can be made instantaneously, rather than first having to transmit incoming data to servers. To enable deployment on edge devices with limited storage and computational capabilities, the full-precision parameters in standard models can be quantized to use fewer bits. The resulting quantized models are then calibrated using back-propagation and full training data to ensure accuracy. This one-time calibration works for deployments in static environments. However, model deployment in dynamic edge environments call for continual calibration to adaptively adjust quantized models to fit new incoming data, which may have different distributions. The first difficulty in enabling continual calibration on the edge is that the full training data may be too large and thus not always available on edge devices. The second difficulty is that the use of back-propagation on the edge for repeated calibration is too expensive. We propose QCore to enable continual calibration on the edge. First, it compresses the full training data into a small subset to enable effective calibration of quantized models with different bit-widths. We also propose means of updating the subset when new streaming data arrives to reflect changes in the environment, while not forgetting earlier training data. Second, we propose a small bit-flipping network that works with the subset to update quantized model parameters, thus enabling efficient continual calibration without back-propagation. An experimental study, conducted with real-world data in a continual learning setting, offers insight into the properties of QCore and shows that it is capable of outperforming strong baseline methods.

翻译：我们正目睹流式数据的日益普及，这些数据可能包含关于底层过程的宝贵信息。因此，将机器学习模型部署在传感器附近的边缘设备上，以实现即时决策（而非先将传入数据传输至服务器）变得颇具吸引力。为能在存储和计算能力有限的边缘设备上部署，可将标准模型中的全精度参数进行量化以使用更少的比特位。随后，利用反向传播和完整训练数据对量化模型进行校准，以确保其精度。这种一次性校准适用于静态环境中的部署。然而，在动态边缘环境中部署模型需要持续校准，以自适应调整量化模型来适应可能具有不同分布的新传入数据。在边缘实现持续校准的第一重困难在于，完整训练数据可能过大，因而无法随时在边缘设备上使用。第二重困难在于，在边缘设备上使用反向传播进行重复校准成本过高。我们提出QCore，以实现边缘上的持续校准。首先，它将完整训练数据压缩成一个小子集，从而能够对具有不同比特宽度的量化模型进行有效校准。我们还提出了一种方法，当新的流式数据到达时可更新该子集以反映环境变化，同时不遗忘先前的训练数据。其次，我们提出一个小型比特翻转网络，该网络与此子集协同工作，以更新量化模型参数，从而无需反向传播即可实现高效的持续校准。在持续学习场景中利用真实世界数据开展的实验研究，揭示了QCore的性质，并表明它能够超越强基线方法。

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日