Human Activity Recognition (HAR) on resource constrained wearables requires models that balance accuracy against strict memory and computational budgets. State of the art lightweight architectures such as TinierHAR (34K parameters), and TinyHAR (55K parameters) achieve strong accuracy, but exceed memory budgets of microcontrollers with limited SRAM once operating system overhead is considered. We present MicroBi-ConvLSTM, an ultra-lightweight convolutional recurrent architecture achieving 11.4K parameters on average through two stage convolutional feature extraction with 4x temporal pooling, and a single bidirectional LSTM layer. This represents 2.9x parameter reduction versus TinierHAR, and 11.9x versus DeepConvLSTM while preserving linear O(N) complexity. Evaluation across eight diverse HAR benchmarks shows that MicroBi-ConvLSTM maintains competitive performance within the ultra-lightweight regime: 93.41% macro F1 on UCI-HAR, 94.46% on SKODA assembly gestures, and 88.98% on Daphnet gait freeze detection. Systematic ablation reveals task dependent component contributions where bidirectionality benefits episodic event detection, but provides marginal gains on periodic locomotion. On-device deployment on the Raspberry Pi Pico 2 and ESP32 validates hardware viability: MicroBi-ConvLSTM is the only architecture achieving full 8/8 dataset coverage on both platforms, with 72.8 ms average latency on Pico 2 and 97.9% PyTorch parity on ESP32, while all three baselines show partial or complete deployment failure.
翻译:在资源受限的可穿戴设备上进行人体活动识别(HAR)要求模型在准确性与严格的内存和计算预算之间取得平衡。现有轻量级架构如TinierHAR(34K参数)和TinyHAR(55K参数)虽能达到较高精度,但在考虑操作系统开销后,其参数规模将超出SRAM有限的微控制器的内存预算。我们提出MicroBi-ConvLSTM——一种超轻量级卷积循环架构,通过两阶段卷积特征提取(含4倍时间池化)和单层双向LSTM,平均仅需11.4K参数。相比TinierHAR参数减少2.9倍,相比DeepConvLSTM减少11.9倍,同时保持O(N)线性复杂度。在八个不同HAR基准测试上的评估表明,MicroBi-ConvLSTM在超轻量级框架内保持竞争性能:在UCI-HAR上宏观F1得分93.41%,在SKODA装配手势数据集上达94.46%,在Daphnet步态冻结检测数据集上达88.98%。系统性消融实验揭示了任务依赖的组件贡献:双向性有利于偶发事件检测,但在周期性运动识别中提升有限。在Raspberry Pi Pico 2和ESP32上的实际部署验证了硬件可行性:MicroBi-ConvLSTM是唯一在两个平台上均实现完整8/8数据集覆盖的架构,在Pico 2上平均延迟72.8毫秒,在ESP32上PyTorch等效精度达97.9%,而三种基准模型均出现部分或完全部署失败。