The rapid evolution of deep learning (DL) models and the ever-increasing size of available datasets have raised the interest of the research community in the always important field of visual hand gesture recognition (VHGR), and delivered a wide range of applications, such as sign language understanding and human-computer interaction using cameras. Despite the large volume of research works in the field, a structured and complete survey on VHGR is still missing, leaving researchers to navigate through hundreds of papers in order to find the right combination of data, model, and approach for each task. The current survey aims to fill this gap by presenting a comprehensive overview of this computer vision field. With a systematic research methodology that identifies the state-of-the-art works and a structured presentation of the various methods, datasets, and evaluation metrics, this review aims to constitute a useful guideline for researchers, helping them to choose the right strategy for handling a VHGR task. Starting with the methodology used to locate the related literature, the survey identifies and organizes the key VHGR approaches in a taxonomy-based format, and presents the various dimensions that affect the final method choice, such as input modality, task type, and application domain. The state-of-the-art techniques are grouped across three primary VHGR tasks: static gesture recognition, isolated dynamic gestures, and continuous gesture recognition. For each task, the architectural trends and learning strategies are listed. To support the experimental evaluation of future methods in the field, the study reviews commonly used datasets and presents the standard performance metrics. Our survey concludes by identifying the major challenges in VHGR, including both general computer vision issues and domain-specific obstacles, and outlines promising directions for future research.
翻译:深度学习模型的快速演进以及可用数据集规模的持续增长,提升了研究界对视觉手势识别这一重要领域的兴趣,并催生了广泛的应用,例如基于摄像头的手语理解与人机交互。尽管该领域已有大量研究工作,但仍缺乏一份结构完整、全面的视觉手势识别综述,导致研究人员需要查阅数百篇文献才能为特定任务找到合适的数据、模型与方法组合。本综述旨在填补这一空白,对该计算机视觉领域进行全面概述。通过采用系统性的研究方法识别前沿工作,并以结构化方式呈现各类方法、数据集和评估指标,本综述旨在为研究人员提供实用指南,帮助其选择处理视觉手势识别任务的合适策略。从定位相关文献的方法论出发,本综述以分类学形式识别并组织了关键的视觉手势识别方法,并阐述了影响最终方法选择的多个维度,如输入模态、任务类型和应用领域。前沿技术被归纳为三大主要视觉手势识别任务:静态手势识别、孤立动态手势识别和连续手势识别。针对每类任务,本文列举了其架构趋势与学习策略。为支持该领域未来方法的实验评估,本研究回顾了常用数据集并介绍了标准性能指标。最后,本综述通过识别视觉手势识别面临的主要挑战——包括通用计算机视觉问题与领域特定障碍——并展望了未来研究的前景方向。