Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in this field, gesture customization is often underexplored. Customization is crucial since it enables users to define and demonstrate gestures that are more natural, memorable, and accessible. However, customization requires efficient usage of user-provided data. We introduce a method that enables users to easily design bespoke gestures with a monocular camera from one demonstration. We employ transformers and meta-learning techniques to address few-shot learning challenges. Unlike prior work, our method supports any combination of one-handed, two-handed, static, and dynamic gestures, including different viewpoints. We evaluated our customization method through a user study with 20 gestures collected from 21 participants, achieving up to 97% average recognition accuracy from one demonstration. Our work provides a viable path for vision-based gesture customization, laying the foundation for future advancements in this domain.
翻译:手势识别正成为人机交互中越来越普遍的模式,尤其是在日常设备中摄像头日益普及的背景下。尽管该领域持续取得进展,手势自定义技术仍鲜有深入探索。自定义至关重要,因为它使用户能够定义和演示更自然、易记、易操作的手势。然而,自定义需要高效利用用户提供的数据。我们提出一种方法,使用户能够通过单目摄像头从单次演示轻松设计定制手势。我们采用Transformer和元学习技术来解决少样本学习挑战。与先前工作不同,我们的方法支持单手、双手、静态和动态手势的任意组合,包括不同视角。我们通过一项用户研究评估了自定义方法,该研究收集了21名参与者的20个手势,单次演示实现了高达97%的平均识别准确率。我们的工作为基于视觉的手势自定义提供了可行路径,为该领域的未来进展奠定了基础。