Helios 2.0: A Robust, Ultra-Low Power Gesture Recognition System Optimised for Event-Sensor based Wearables

Prarthana Bhattacharyya,Joshua Mitton,Ryan Page,Owen Morgan,Oliver Powell,Benjamin Menzies,Gabriel Homewood,Kemi Jacobs,Paolo Baesso,Taru Muhonen,Richard Vigars,Louis Berridge

from arxiv, 24 pages, 14 figures. Prarthana Bhattacharyya, Joshua Mitton, Ryan Page, Owen Morgan, and Oliver Powell contributed equally to this paper

We present an advance in wearable technology: a mobile-optimized, real-time, ultra-low-power event camera system that enables natural hand gesture control for smart glasses, dramatically improving user experience. While hand gesture recognition in computer vision has advanced significantly, critical challenges remain in creating systems that are intuitive, adaptable across diverse users and environments, and energy-efficient enough for practical wearable applications. Our approach tackles these challenges through carefully selected microgestures: lateral thumb swipes across the index finger (in both directions) and a double pinch between thumb and index fingertips. These human-centered interactions leverage natural hand movements, ensuring intuitive usability without requiring users to learn complex command sequences. To overcome variability in users and environments, we developed a novel simulation methodology that enables comprehensive domain sampling without extensive real-world data collection. Our power-optimised architecture maintains exceptional performance, achieving F1 scores above 80\% on benchmark datasets featuring diverse users and environments. The resulting models operate at just 6-8 mW when exploiting the Qualcomm Snapdragon Hexagon DSP, with our 2-channel implementation exceeding 70\% F1 accuracy and our 6-channel model surpassing 80\% F1 accuracy across all gesture classes in user studies. These results were achieved using only synthetic training data. This improves on the state-of-the-art for F1 accuracy by 20\% with a power reduction 25x when using DSP. This advancement brings deploying ultra-low-power vision systems in wearable devices closer and opens new possibilities for seamless human-computer interaction.

翻译：我们提出了一项可穿戴技术进展：一种针对移动设备优化的实时超低功耗事件相机系统，能够实现智能眼镜的自然手势控制，显著提升用户体验。尽管计算机视觉中的手势识别已取得重大进展，但在创建直观、能适应不同用户和环境、且足够节能以用于实际可穿戴应用的系统方面，仍存在关键挑战。我们的方法通过精心选择的微手势来应对这些挑战：拇指在食指上的横向滑动（双向）以及拇指与食指尖的双次捏合。这些以人为中心的交互利用了自然的手部动作，确保了直观的可用性，无需用户学习复杂的命令序列。为克服用户和环境的差异性，我们开发了一种新颖的仿真方法，能够在无需大量真实世界数据收集的情况下实现全面的域采样。我们的功耗优化架构保持了卓越的性能，在包含不同用户和环境的基准数据集上实现了超过80%的F1分数。所得模型在利用高通骁龙Hexagon DSP运行时功耗仅为6-8 mW，其中我们的2通道实现在用户研究中所有手势类别的F1准确率超过70%，而6通道模型的F1准确率超过80%。这些结果仅使用合成训练数据达成。与现有技术相比，在使用DSP时，F1准确率提高了20%，同时功耗降低了25倍。这一进展使得在可穿戴设备中部署超低功耗视觉系统更近一步，并为无缝人机交互开辟了新的可能性。