Hand Gesture Recognition (HGR) enables intuitive human-computer interactions in various real-world contexts. However, existing frameworks often struggle to meet the real-time requirements essential for practical HGR applications. This study introduces a robust, skeleton-based framework for dynamic HGR that simplifies the recognition of dynamic hand gestures into a static image classification task, effectively reducing both hardware and computational demands. Our framework utilizes a data-level fusion technique to encode 3D skeleton data from dynamic gestures into static RGB spatiotemporal images. It incorporates a specialized end-to-end Ensemble Tuner (e2eET) Multi-Stream CNN architecture that optimizes the semantic connections between data representations while minimizing computational needs. Tested across five benchmark datasets (SHREC'17, DHG-14/28, FPHA, LMDHG, and CNR), the framework showed competitive performance with the state-of-the-art. Its capability to support real-time HGR applications was also demonstrated through deployment on standard consumer PC hardware, showcasing low latency and minimal resource usage in real-world settings. The successful deployment of this framework underscores its potential to enhance real-time applications in fields such as virtual/augmented reality, ambient intelligence, and assistive technologies, providing a scalable and efficient solution for dynamic gesture recognition.
翻译:手势识别(HGR)能够在多种现实场景中实现直观的人机交互。然而,现有框架往往难以满足实际HGR应用所必需的实时性要求。本研究提出了一种鲁棒的骨架式动态HGR框架,将动态手势识别简化为静态图像分类任务,有效降低了硬件与计算需求。该框架采用数据级融合技术,将动态手势的3D骨架数据编码为静态RGB时空图像,并引入一种专门的端到端集成调谐器(e2eET)多流CNN架构,在优化数据表示间语义关联的同时最小化计算开销。在五个基准数据集(SHREC'17、DHG-14/28、FPHA、LMDHG和CNR)上的测试表明,该框架达到了与当前最优方法相当的识别性能。通过在标准消费级PC硬件上的部署实验,进一步验证了其支持实时HGR应用的能力,在实际场景中展现出低延迟与低资源占用的特性。该框架的成功部署凸显了其在虚拟/增强现实、环境智能及辅助技术等领域提升实时应用性能的潜力,为动态手势识别提供了可扩展的高效解决方案。