Automatic sign language recognition (SLR) is an important topic within the areas of human-computer interaction and machine learning. On the one hand, it poses a complex challenge that requires the intervention of various knowledge areas, such as video processing, image processing, intelligent systems and linguistics. On the other hand, robust recognition of sign language could assist in the translation process and the integration of hearing-impaired people, as well as the teaching of sign language for the hearing population. SLR systems usually employ Hidden Markov Models, Dynamic Time Warping or similar models to recognize signs. Such techniques exploit the sequential ordering of frames to reduce the number of hypothesis. This paper presents a general probabilistic model for sign classification that combines sub-classifiers based on different types of features such as position, movement and handshape. The model employs a bag-of-words approach in all classification steps, to explore the hypothesis that ordering is not essential for recognition. The proposed model achieved an accuracy rate of 97% on an Argentinian Sign Language dataset containing 64 classes of signs and 3200 samples, providing some evidence that indeed recognition without ordering is possible.
翻译:自动手语识别是人机交互和机器学习领域的重要课题。该任务一方面涉及视频处理、图像处理、智能系统与语言学等多领域知识的交叉融合,构成复杂挑战;另一方面,稳健的手语识别技术可辅助听力障碍人群的翻译交流过程、促进其社会融入,同时为健听人群学习手语提供支持。现有手语识别系统通常采用隐马尔可夫模型、动态时间规整或类似模型进行手势识别,这些技术通过利用帧的时序顺序来减少假设空间。本文提出一种融合位置、运动及手势形状等多类特征子分类器的通用概率模型,在手语分类全过程中采用词袋方法,旨在探究时序顺序并非识别必要条件的假设。该模型在包含64个手势类别、3200个样本的阿根廷手语数据集上实现了97%的准确率,初步证明无时序顺序的手语识别具有可行性。