Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in this field, gesture customization is often underexplored. Customization is crucial since it enables users to define and demonstrate gestures that are more natural, memorable, and accessible. However, customization requires efficient usage of user-provided data. We introduce a method that enables users to easily design bespoke gestures with a monocular camera from one demonstration. We employ transformers and meta-learning techniques to address few-shot learning challenges. Unlike prior work, our method supports any combination of one-handed, two-handed, static, and dynamic gestures, including different viewpoints, and the ability to handle irrelevant hand movements. We implement three real-world applications using our customization method, conduct a user study, and achieve up to 94% average recognition accuracy from one demonstration. Our work provides a viable path for vision-based gesture customization, laying the foundation for future advancements in this domain.
翻译:手势识别正日益成为人机交互的主流模式,尤其是随着摄像头在日常设备中的普及。尽管该领域持续取得进展,手势定制功能却往往未被充分探索。定制功能至关重要,因为它允许用户定义并演示更自然、易记忆且易于使用的手势。然而,定制过程需要高效利用用户提供的数据。本文提出一种方法,使用户能够通过单目摄像头仅需一次演示即可轻松设计定制手势。我们采用Transformer架构与元学习技术以应对小样本学习挑战。与先前工作不同,本方法支持单手、双手、静态与动态手势的任意组合,包括不同视角下的手势,并能处理无关的手部动作。我们使用该定制方法实现了三个实际应用,开展了用户研究,并在单次演示条件下实现了高达94%的平均识别准确率。本研究为基于视觉的手势定制提供了可行路径,为该领域的未来发展奠定了基础。