Labeling is the cornerstone of supervised machine learning, which has been exploited in a plethora of various applications, with sign language recognition being one of them. However, such algorithms must be fed with a huge amount of consistently labeled data during the training process to elaborate a well-generalizing model. In addition, there is a great need for an automated solution that works with any nationally diversified sign language. Although there are language-agnostic transcription systems, such as the Hamburg Sign Language Notation System (HamNoSys) that describe the signer's initial position and body movement instead of the glosses' meanings, there are still issues with providing accurate and reliable labels for every real-world use case. In this context, the industry relies heavily on manual attribution and labeling of the available video data. In this work, we tackle this issue and thoroughly analyze the HamNoSys labels provided by various maintainers of open sign language corpora in five sign languages, in order to examine the challenges encountered in labeling video data. We also investigate the consistency and objectivity of HamNoSys-based labels for the purpose of training machine learning models. Our findings provide valuable insights into the limitations of the current labeling methods and pave the way for future research on developing more accurate and efficient solutions for sign language recognition.
翻译:标注是监督式机器学习的基石,该方法已被广泛应用于众多领域,其中手语识别是典型应用之一。然而,此类算法在训练过程中必须依赖大量一致性标注数据,才能构建具有良好泛化能力的模型。此外,业界亟需一套适用于各类国家手语的自动化解决方案。尽管存在如汉堡手语标注系统(HamNoSys)这类不受语言局限的转录系统(其描述的是手语者的初始位置与肢体动作,而非手势的语义内涵),但在实际应用中,如何为每个场景提供准确可靠的标注仍存在挑战。在此背景下,业界主要依赖人工对现有视频资料进行属性标注。本研究针对这一问题,系统分析了五种手语公开语料库中不同维护者提供的HamNoSys标注,深入探究视频数据标注过程中面临的挑战。我们还考察了基于HamNoSys的标注在为机器学习模型训练时的连贯性与客观性。研究结果揭示了当前标注方法的局限性,为开发更精准高效的手语识别解决方案指明了未来研究方向。