We propose a new large synthetic hand pose estimation dataset, Hi5, and a novel inexpensive method for collecting high-quality synthetic data that requires no human annotation or validation. Leveraging recent advancements in computer graphics, high-fidelity 3D hand models with diverse genders and skin colors, and dynamic environments and camera movements, our data synthesis pipeline allows precise control over data diversity and representation, ensuring robust and fair model training. We generate a dataset with 583,000 images with accurate pose annotation using a single consumer PC that closely represents real-world variability. Pose estimation models trained with Hi5 perform competitively on real-hand benchmarks while surpassing models trained with real data when tested on occlusions and perturbations. Our experiments show promising results for synthetic data as a viable solution for data representation problems in real datasets. Overall, this paper provides a promising new approach to synthetic data creation and annotation that can reduce costs and increase the diversity and quality of data for hand pose estimation.
翻译:摘要:本文提出了一种新的大型合成手部姿态估计数据集Hi5,以及一种无需人工标注或验证的高质量合成数据采集方法。借助计算机图形学的最新进展、包含多样性别与肤色的高保真三维手部模型以及动态环境与相机运动,我们的数据合成管线能够精确控制数据的多样性与代表性,确保模型训练的鲁棒性与公平性。我们使用一台普通消费级PC生成了包含583,000张图像的数据集,该数据集带有精确的姿态标注,能够紧密反映真实世界的变异性。基于Hi5训练的姿态估计模型在真实手部基准测试中表现出竞争力,且在遮挡与扰动测试中超越了使用真实数据训练的模型。我们的实验表明,合成数据作为解决真实数据集中数据表征问题的可行方案具有广阔前景。总体而言,本文为手部姿态估计的合成数据创建与标注提供了一种有前景的新方法,能够降低成本并提升数据的多样性与质量。