Reliable uncertainty estimation is a key missing piece for on-device monitoring in TinyML: microcontrollers must detect failures, distribution shift, or accuracy drops under strict flash/latency budgets, yet common uncertainty approaches (deep ensembles, MC dropout, early exits, temporal buffering) typically require multiple passes, extra branches, or state that is impractical on milliwatt hardware. This paper proposes a novel and practical method, SNAP-UQ, for single-pass, label-free uncertainty estimation based on depth-wise next-activation prediction. SNAP-UQ taps a small set of backbone layers and uses tiny int8 heads to predict the mean and scale of the next activation from a low-rank projection of the previous one; the resulting standardized prediction error forms a depth-wise surprisal signal that is aggregated and mapped through a lightweight monotone calibrator into an actionable uncertainty score. The design introduces no temporal buffers or auxiliary exits and preserves state-free inference, while increasing deployment footprint by only a few tens of kilobytes. Across vision and audio backbones, SNAP-UQ reduces flash and latency relative to early-exit and deep-ensemble baselines (typically $\sim$40--60% smaller and $\sim$25--35% faster), with several competing methods at similar accuracy often exceeding MCU memory limits. On corrupted streams, it improves accuracy-drop event detection by multiple AUPRC points and maintains strong failure detection (AUROC $\approx 0.9$) in a single forward pass. By grounding uncertainty in layer-to-layer dynamics rather than solely in output confidence, SNAP-UQ offers a novel, resource-efficient basis for robust TinyML monitoring.
翻译:可靠的 uncertainty estimation 是 TinyML 设备端监控缺失的关键一环:微控制器必须在严格的闪存/延迟预算下检测故障、分布偏移或精度下降,然而常见的 uncertainty 方法(深度集成、MC dropout、早期退出、时间缓冲)通常需要多次前向传播、额外分支或状态,这在毫瓦级硬件上不切实际。本文提出了一种新颖且实用的方法 SNAP-UQ,用于基于深度方向下一激活预测的单次前向传播、无标签的 uncertainty estimation。SNAP-UQ 利用骨干网络中的一小部分层,并使用微小的 int8 预测头,从前一激活的低秩投影中预测下一激活的均值和尺度;由此产生的标准化预测误差形成一个深度方向的惊奇信号,该信号被聚合并通过一个轻量级单调校准器映射为可操作的 uncertainty 分数。该设计不引入时间缓冲区或辅助退出分支,保持了无状态推理,同时仅增加数十千字节的部署占用空间。在视觉和音频骨干网络上,与早期退出和深度集成基线相比,SNAP-UQ 减少了闪存占用和延迟(通常缩小约 40–60%,加快约 25–35%),而几种在相似精度下竞争的方法常常超出 MCU 内存限制。在损坏的数据流上,它将精度下降事件检测的 AUPRC 提高了多个点,并在单次前向传播中保持了强大的故障检测能力(AUROC ≈ 0.9)。通过将 uncertainty 建立在层间动态而非仅输出置信度的基础上,SNAP-UQ 为鲁棒的 TinyML 监控提供了一个新颖、资源高效的基础。