We use insights from research on American Sign Language (ASL) phonology to train models for isolated sign language recognition (ISLR), a step towards automatic sign language understanding. Our key insight is to explicitly recognize the role of phonology in sign production to achieve more accurate ISLR than existing work which does not consider sign language phonology. We train ISLR models that take in pose estimations of a signer producing a single sign to predict not only the sign but additionally its phonological characteristics, such as the handshape. These auxiliary predictions lead to a nearly 9% absolute gain in sign recognition accuracy on the WLASL benchmark, with consistent improvements in ISLR regardless of the underlying prediction model architecture. This work has the potential to accelerate linguistic research in the domain of signed languages and reduce communication barriers between deaf and hearing people.
翻译:我们借鉴美国手语(ASL)音系学的研究成果,训练面向孤立手语识别(ISLR)的模型,这是迈向自动手语理解的重要一步。我们的核心见解在于,明确认识到音系在手语产生中的作用,从而实现比现有不考虑手语音系学的工作更高的ISLR准确性。我们训练的ISLR模型输入单个手语动作的姿势估计,不仅预测该手语,还预测其音系特征(例如手形)。这些辅助预测在WLASL基准上使手语识别准确率提升了近9个绝对百分点,且无论底层预测模型架构如何,ISLR均能持续改进。这项工作有望加速手语领域的语言学研究,并减少聋人与听力正常人士之间的沟通障碍。