This paper proposes a novel and practical method, SNAP-UQ, for single-pass, label-free uncertainty estimation based on depth-wise next-activation prediction. SNAP-UQ taps a small set of backbone layers and uses tiny int8 heads to predict the mean and scale of the next activation from a low-rank projection of the previous one; the resulting standardized prediction error forms a depth-wise surprisal signal that is aggregated and mapped through a lightweight monotone calibrator into an actionable uncertainty score. The design introduces no temporal buffers or auxiliary exits and preserves state-free inference, while increasing deployment footprint by only a few tens of kilobytes. Across vision and audio backbones, SNAP-UQ reduces flash and latency relative to early-exit and deep-ensemble baselines (typically $\sim$40--60% smaller and $\sim$25--35% faster), with several competing methods at similar accuracy often exceeding MCU memory limits. On corrupted streams, it improves accuracy-drop event detection by multiple AUPRC points and maintains strong failure detection (AUROC $\approx 0.9$) in a single forward pass. By grounding uncertainty in layer-to-layer dynamics rather than solely in output confidence, SNAP-UQ offers a novel, resource-efficient basis for robust TinyML monitoring.
翻译:本文提出了一种新颖且实用的方法SNAP-UQ,用于基于深度方向下一激活预测的单次前向传播、无标签不确定性估计。SNAP-UQ利用骨干网络中的一小部分层,并使用微小的int8头部,通过前一激活的低秩投影来预测下一激活的均值和尺度;由此产生的标准化预测误差形成一个深度方向的惊奇信号,该信号通过一个轻量级单调校准器进行聚合与映射,最终生成一个可操作的不确定性分数。该设计无需引入时间缓冲区或辅助退出点,保持了无状态推理特性,同时仅增加数十KB的部署开销。在视觉和音频骨干网络上的实验表明,相较于早退和深度集成基线方法,SNAP-UQ显著减少了存储占用和延迟(通常减少约40–60%,加速约25–35%),而多种精度相近的竞争方法往往超出微控制器内存限制。在数据流受损场景下,该方法将精度下降事件检测的AUPRC提升了多个点,并在单次前向传播中保持了强大的故障检测能力(AUROC ≈ 0.9)。通过将不确定性建立在层间动态特性而非仅输出置信度上,SNAP-UQ为鲁棒的TinyML监控提供了一个新颖且资源高效的基础。