Automatic sign language recognition is an important topic within the areas of human-computer interaction and machine learning. On the one hand, it poses a complex challenge that requires the intervention of various knowledge areas, such as video processing, image processing, intelligent systems and linguistics. On the other hand, robust recognition of sign language could assist in the translation process and the integration of hearing-impaired people. This paper offers two main contributions: first, the creation of a database of handshapes for the Argentinian Sign Language (LSA), which is a topic that has barely been discussed so far. Secondly, a technique for image processing, descriptor extraction and subsequent handshape classification using a supervised adaptation of self-organizing maps that is called ProbSom. This technique is compared to others in the state of the art, such as Support Vector Machines (SVM), Random Forests, and Neural Networks. The database that was built contains 800 images with 16 LSA handshapes, and is a first step towards building a comprehensive database of Argentinian signs. The ProbSom-based neural classifier, using the proposed descriptor, achieved an accuracy rate above 90%.
翻译:自动手语识别是人机交互与机器学习领域的重要课题。一方面,该任务需要融合视频处理、图像处理、智能系统与语言学等多学科知识,构成了复杂的挑战;另一方面,可靠的手语识别技术能够辅助翻译过程并促进听障人士的社会融合。本文提出两项主要贡献:首先,构建了阿根廷手语(LSA)的手势数据库——该议题此前鲜有探讨;其次,提出了一种基于自组织映射监督式改进算法ProbSom的图像处理、描述符提取与后续手势分类技术。我们将该方法与当前主流技术(如支持向量机SVM、随机森林和神经网络)进行了对比。所构建的数据库包含800张覆盖16种LSA手势的图像,是建立阿根廷手语综合数据库的第一步。采用所提描述符的ProbSom神经分类器实现了超过90%的准确率。