Once-for-All Channel Mixers (HYPERTINYPW): Generative Compression for TinyML

from arxiv, 12 pages, 5 figures. Accepted at MLSys 2026. TinyML / on-device learning paper on hypernetwork-based compression for ECG and other 1D biosignals, with integer-only inference on commodity MCUs. Evaluated on Apnea-ECG, PTB-XL, and MIT-BIH. Camera-ready version with additional datasets, experiments, and insights will appear after May 2026

Deploying neural networks on microcontrollers is constrained by kilobytes of flash and SRAM, where 1x1 pointwise (PW) mixers often dominate memory even after INT8 quantization across vision, audio, and wearable sensing. We present HYPER-TINYPW, a compression-as-generation approach that replaces most stored PW weights with generated weights: a shared micro-MLP synthesizes PW kernels once at load time from tiny per-layer codes, caches them, and executes them with standard integer operators. This preserves commodity MCU runtimes and adds only a one-off synthesis cost; steady-state latency and energy match INT8 separable CNN baselines. Enforcing a shared latent basis across layers removes cross-layer redundancy, while keeping PW1 in INT8 stabilizes early, morphology-sensitive mixing. We contribute (i) TinyML-faithful packed-byte accounting covering generator, heads/factorization, codes, kept PW1, and backbone; (ii) a unified evaluation with validation-tuned t* and bootstrap confidence intervals; and (iii) a deployability analysis covering integer-only inference and boot versus lazy synthesis. On three ECG benchmarks (Apnea-ECG, PTB-XL, MIT-BIH), HYPER-TINYPW shifts the macro-F1 versus flash Pareto frontier: at about 225 kB it matches a roughly 1.4 MB CNN while being 6.31x smaller (84.15% fewer bytes), retaining at least 95% of large-model macro-F1. Under 32-64 kB budgets it sustains balanced detection where compact baselines degrade. The mechanism applies broadly to other 1D biosignals, on-device speech, and embedded sensing tasks where per-layer redundancy dominates, indicating a wider role for compression-as-generation in resource-constrained ML systems. Beyond ECG, HYPER-TINYPW transfers to TinyML audio: on Speech Commands it reaches 96.2% test accuracy (98.2% best validation), supporting broader applicability to embedded sensing workloads where repeated linear mixers dominate memory.

翻译：在微控制器上部署神经网络受限于数KB的闪存和SRAM，即使经过INT8量化，在视觉、音频和可穿戴传感任务中，1×1逐点（PW）混合器仍常占据主要内存。我们提出HYPER-TINYPW——一种压缩即生成方法，用生成权重替代大部分存储的PW权重：共享的微型MLP在加载时从极小的逐层编码中一次性合成PW核，将其缓存，并采用标准整数算子执行。该方法保留了商用MCU的运行时性能，仅增加一次性合成开销；稳态延迟和能耗与INT8可分离CNN基线相当。通过强制各层共享潜在基，消除了跨层冗余，同时保留INT8格式的PW1层以稳定早期形态敏感混合。本文贡献如下：(i) 涵盖生成器、头部/分解、编码、保留PW1层及主干网络的TinyML忠实打包字节核算；(ii) 采用验证调优的t*和自助置信区间的统一评估；(iii) 涵盖纯整数推理、启动与惰性合成的可部署性分析。在三个心电图基准（Apnea-ECG、PTB-XL、MIT-BIH）上，HYPER-TINYPW将宏F1与闪存帕累托前沿重新定义：约225KB时，其性能与约1.4MB的CNN相当，而体积缩小6.31倍（字节减少84.15%），保留了大模型至少95%的宏F1性能。在32-64KB预算下，其在紧凑基线退化时仍能维持平衡检测。该机制广泛适用于其他一维生物信号、设备端语音及嵌入式传感任务（其间逐层冗余占主导），表明压缩即生成在资源受限ML系统中的更广泛作用。除ECG外，HYPER-TINYPW可迁移至TinyML音频任务：在Speech Commands上达到96.2%测试准确率（最佳验证准确率98.2%），支撑了其对内存受重复线性混合器主导的嵌入式传感工作负载的更广泛适用性。