HaSPeR: An Image Repository for Hand Shadow Puppet Recognition

Hand shadow puppetry, also known as shadowgraphy or ombromanie, is a form of theatrical art and storytelling where hand shadows are projected onto flat surfaces to create illusions of living creatures. The skilled performers create these silhouettes by hand positioning, finger movements, and dexterous gestures to resemble shadows of animals and objects. Due to the lack of practitioners and a seismic shift in people's entertainment standards, this art form is on the verge of extinction. To facilitate its preservation and proliferate it to a wider audience, we introduce ${\rm H{\small A}SP{\small E}R}$, a novel dataset consisting of 15,000 images of hand shadow puppets across 15 classes extracted from both professional and amateur hand shadow puppeteer clips. We provide a detailed statistical analysis of the dataset and employ a range of pretrained image classification models to establish baselines. Our findings show a substantial performance superiority of skip-connected convolutional models over attention-based transformer architectures. We also find that lightweight models, such as MobileNetV2, suited for mobile applications and embedded devices, perform comparatively well. We surmise that such low-latency architectures can be useful in developing ombromanie teaching tools, and we create a prototype application to explore this surmission. Keeping the best-performing model ResNet34 under the limelight, we conduct comprehensive feature-spatial, explainability, and error analyses to gain insights into its decision-making process. To the best of our knowledge, this is the first documented dataset and research endeavor to preserve this dying art for future generations, with computer vision approaches. Our code and data will be publicly available.

翻译：手影戏（亦称光影艺术或手影术）是一种通过将手部阴影投射于平面以创造生物幻象的戏剧艺术与叙事形式。技艺精湛的表演者通过手部定位、手指运动及灵巧姿态塑造出动物与物体的剪影。由于从业者稀缺及大众娱乐标准的剧变，该艺术形式正濒临失传。为促进其保护并向更广泛受众传播，我们提出${\rm H{\small A}SP{\small E}R}$——一个包含15个类别、共计15,000张手影图像的新型数据集，图像源自专业及业余手影表演者的视频片段。我们对数据集进行了详尽的统计分析，并采用一系列预训练图像分类模型建立性能基准。研究发现：跳跃连接卷积模型在性能上显著优于基于注意力机制的Transformer架构；同时发现适用于移动应用与嵌入式设备的轻量级模型（如MobileNetV2）表现相对良好。我们推断此类低延迟架构可用于开发手影术教学工具，并构建了原型应用以验证此设想。以性能最优的ResNet34模型为核心，我们开展了全面的特征空间分析、可解释性分析与误差分析，以深入理解其决策机制。据我们所知，这是首次通过计算机视觉方法记录该濒危艺术并开展保护研究的工作。我们的代码与数据将公开提供。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日