AnyHand: A Large-Scale Synthetic Dataset for RGB(-D) Hand Pose Estimation

We present AnyHand, a large-scale synthetic dataset designed to advance the state of the art in 3D hand pose estimation from both RGB-only and RGB-D inputs. While recent works with foundation approaches have shown that an increase in the quantity and diversity of training data can markedly improve performance and robustness in hand pose estimation, existing real-world-collected datasets on this task are limited in coverage, and prior synthetic datasets rarely provide occlusions, arm details, and aligned depth together at scale. To address this bottleneck, our AnyHand contains 2.5M single-hand and 4.1M hand-object interaction RGB-D images, with rich geometric annotations. In the RGB-only setting, we show that extending the original training sets of existing baselines with AnyHand yields significant gains on multiple benchmarks (FreiHAND and HO-3D), even when keeping the architecture and training scheme fixed. More impressively, the model trained with AnyHand shows stronger generalization to the out-of-domain HO-Cap dataset, without any fine-tuning. We also contribute a lightweight depth fusion module that can be easily integrated into existing RGB-based models. Trained with AnyHand, the resulting RGB-D model achieves superior performance on the HO-3D benchmark, showing the benefits of depth integration and the effectiveness of our synthetic data.

翻译：我们提出AnyHand，一个旨在推动仅基于RGB及RGB-D输入的3D手部姿态估计技术发展的大规模合成数据集。尽管近期基于基础方法的研究表明，训练数据数量与多样性的提升能显著改善手部姿态估计的性能与鲁棒性，但现有真实世界采集的相关数据集覆盖范围有限，而此前合成数据集极少能同时大规模提供遮挡、手臂细节及对齐深度信息。为解决这一瓶颈，本数据集包含250万张单手图像与410万张手物交互RGB-D图像，并附带丰富的几何标注。在仅含RGB的场景下，即便保持网络架构与训练方案不变，使用AnyHand扩充现有基线模型原始训练集后，其在多个基准（FreiHAND与HO-3D）上均取得显著性能提升。更令人瞩目的是，基于AnyHand训练的模型无需微调即展现出对域外HO-Cap数据集更强的泛化能力。我们还提出一种轻量级深度融合模块，可便捷集成至现有基于RGB的模型中。经AnyHand训练后，所生成的RGB-D模型在HO-3D基准上表现卓越，充分证明了深度信息整合的优越性以及合成数据的有效性。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

基于深度学习的物体姿态估计综述

专知会员服务

26+阅读 · 2024年5月15日

《基于边缘智能的可穿戴多模态手势识别》美空军2023最新38页报告

专知会员服务

50+阅读 · 2023年4月28日

动态手势理解与交互综述

专知会员服务

34+阅读 · 2021年10月11日

最新《深度学习人体姿态估计》综述论文，26页pdf

专知会员服务

40+阅读 · 2020年12月29日