The Indian classical dance-drama Kathakali has a set of hand gestures called Mudras, which form the fundamental units of all its dance moves and postures. Recognizing the depicted mudra becomes one of the first steps in its digital processing. The work treats the problem as a 24-class classification task and proposes a vector-similarity-based approach using pose estimation, eliminating the need for further training or fine-tuning. This approach overcomes the challenge of data scarcity that limits the application of AI in similar domains. The method attains 92% accuracy which is a similar or better performance as other model-training-based works existing in the domain, with the added advantage that the method can still work with data sizes as small as 1 or 5 samples with a slightly reduced performance. Working with images, videos, and even real-time streams is possible. The system can work with hand-cropped or full-body images alike. We have developed and made public a dataset for the Kathakali Mudra Recognition as part of this work.
翻译:印度古典舞蹈剧卡塔卡利有一套称为“穆德拉”的手势,这些手势构成了其所有舞蹈动作和姿态的基本单元。识别所描绘的穆德拉成为其数字处理的首要步骤之一。本研究将该问题视为一个24类分类任务,并提出一种基于姿态估计的向量相似性方法,无需进一步训练或微调。该方法克服了数据稀缺这一限制人工智能在类似领域应用的挑战。该方法的准确率达到92%,其性能与领域中其他基于模型训练的工作相当或更优,且额外优势在于即便数据量小至1或5个样本,该方法仍能工作,仅性能略有下降。该方法支持处理图像、视频乃至实时流数据。系统可同时处理手部裁剪图像或全身图像。作为本研究的一部分,我们开发并公开了一个卡塔卡利穆德拉识别数据集。