BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals

Biosignals exhibit substantial cross-subject and cross-session variability, inducing severe domain shifts that degrade post-deployment performance for small, edge-oriented AI models. On-device adaptation is therefore essential to both preserve user privacy and ensure system reliability. However, existing sub-100 mW MCU-based wearable platforms can only support shallow or sparse adaptation schemes due to the prohibitive memory footprint and computational cost of full backpropagation (BP). In this paper, we propose BioTrain, a framework enabling full-network fine-tuning of state-of-the-art biosignal models under milliwatt-scale power and sub-megabyte memory constraints. We validate BioTrain using both offline and on-device benchmarks on EEG and EOG datasets, covering Day-1 new-subject calibration and longitudinal adaptation to signal drift. Experimental results show that full-network fine-tuning achieves accuracy improvements of up to 35% over non-adapted baselines and outperforms last-layer updates by approximately 7% during new-subject calibration. On the GAP9 MCU platform, BioTrain enables efficient on-device training throughput of 17 samples/s for EEG and 85 samples/s for EOG models within a power envelope below 50 mW. In addition, BioTrain's efficient memory allocator and network topology optimization enable the use of a large batch size, reducing peak memory usage. For fully on-chip BP on GAP9, BioTrain reduces the memory footprint by 8.1x, from 5.4 MB to 0.67 MB, compared to conventional full-network fine-tuning using batch normalization with batch size 8.

翻译：生物信号在跨受试者和跨会话间存在显著变异性，导致严重域偏移，从而降低小型边缘AI模型部署后的性能。因此，设备端自适应对于保护用户隐私和确保系统可靠性都至关重要。然而，现有的基于亚100 mW微控制器单元（MCU）的可穿戴平台，由于全反向传播（BP）的内存占用和计算成本过高，只能支持浅层或稀疏的自适应方案。本文提出BioTrain框架，该框架能够在毫瓦级功耗和亚兆字节内存约束下，实现最先进生物信号模型的全网络微调。我们使用脑电图（EEG）和眼电图（EOG）数据集，通过离线与设备端基准测试验证BioTrain，涵盖首日新受试者标定和针对信号漂移的纵向自适应。实验结果表明，全网络微调相比未自适应基线可提升高达35%的准确率，在新受试者标定中比仅更新最后一层的方法高出约7%。在GAP9 MCU平台上，BioTrain在低于50 mW的功耗范围内，实现了EEG模型17样本/秒和EOG模型85样本/秒的高效设备端训练吞吐量。此外，BioTrain的高效内存分配器和网络拓扑优化支持使用大批量训练，降低了峰值内存使用。对于GAP9上的全芯片内BP，与使用批量归一化且批量大小为8的传统全网络微调相比，BioTrain将内存占用从5.4 MB降至0.67 MB，减少了8.1倍。