BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals

Biosignals exhibit substantial cross-subject and cross-session variability, inducing severe domain shifts that degrade post-deployment performance for small, edge-oriented AI models. On-device adaptation is therefore essential to both preserve user privacy and ensure system reliability. However, existing sub-100 mW MCU-based wearable platforms can only support shallow or sparse adaptation schemes due to the prohibitive memory footprint and computational cost of full backpropagation (BP). In this paper, we propose BioTrain, a framework enabling full-network fine-tuning of state-of-the-art biosignal models under milliwatt-scale power and sub-megabyte memory constraints. We validate BioTrain using both offline and on-device benchmarks on EEG and EOG datasets, covering Day-1 new-subject calibration and longitudinal adaptation to signal drift. Experimental results show that full-network fine-tuning achieves accuracy improvements of up to 35% over non-adapted baselines and outperforms last-layer updates by approximately 7% during new-subject calibration. On the GAP9 MCU platform, BioTrain enables efficient on-device training throughput of 17 samples/s for EEG and 85 samples/s for EOG models within a power envelope below 50 mW. In addition, BioTrain's efficient memory allocator and network topology optimization enable the use of a large batch size, reducing peak memory usage. For fully on-chip BP on GAP9, BioTrain reduces the memory footprint by 8.1x, from 5.4 MB to 0.67 MB, compared to conventional full-network fine-tuning using batch normalization with batch size 8.

翻译：生物信号在跨对象和跨会话间存在显著变异性，导致严重的域偏移，从而降低小型边缘人工智能模型的部署后性能。因此，设备端自适应对于保护用户隐私和确保系统可靠性至关重要。然而，现有基于亚100毫瓦微控制器的可穿戴平台仅能支持浅层或稀疏自适应方案，原因是全反向传播所需的内存占用和计算成本过高。本文提出BioTrain框架，能够在毫瓦级功耗和亚兆字节内存约束下，实现对最先进生物信号模型的全网络微调。我们使用脑电图和眼电图数据集，通过离线与设备端基准测试验证BioTrain，涵盖新对象日间校准与纵向信号漂移自适应。实验结果表明，全网络微调相比未自适应基线最高可提升35%的准确率，在新对象校准中比仅更新最后一层的方法高出约7%。在GAP9微控制器平台上，BioTrain在低于50毫瓦的功耗范围内，实现了脑电图模型17样本/秒和眼电图模型85样本/秒的高效设备端训练吞吐量。此外，BioTrain的高效内存分配器与网络拓扑优化支持大批量处理，降低了峰值内存使用。在GAP9上进行全芯片内反向传播时，与使用批量归一化且批量大小为8的传统全网络微调相比，BioTrain将内存占用从5.4 MB压缩至0.67 MB，降低了8.1倍。