In this paper, we introduce RealDex, a pioneering dataset capturing authentic dexterous hand grasping motions infused with human behavioral patterns, enriched by multi-view and multimodal visual data. Utilizing a teleoperation system, we seamlessly synchronize human-robot hand poses in real time. This collection of human-like motions is crucial for training dexterous hands to mimic human movements more naturally and precisely. RealDex holds immense promise in advancing humanoid robot for automated perception, cognition, and manipulation in real-world scenarios. Moreover, we introduce a cutting-edge dexterous grasping motion generation framework, which aligns with human experience and enhances real-world applicability through effectively utilizing Multimodal Large Language Models. Extensive experiments have demonstrated the superior performance of our method on RealDex and other open datasets. The complete dataset and code will be made available upon the publication of this work.
翻译:本文介绍了RealDex——一个开创性的数据集,该数据集捕捉了融合人类行为模式的真实灵巧手抓取运动,并辅以多视角和多模态视觉数据。通过遥操作控制系统,我们实现了人机手部姿态的实时同步。这类类人运动数据对于训练灵巧手更自然、更精准地模仿人类动作至关重要。RealDex在推动人形机器人实现真实场景中的自动化感知、认知与操作方面具有广阔前景。此外,本文还提出了一种前沿的灵巧手抓取运动生成框架,该框架通过有效利用多模态大语言模型,既能与人类经验对齐,又能增强现实世界的适用性。大量实验证明,我们的方法在RealDex及其他公开数据集上均取得了卓越性能。完整的代码与数据集将在本文发表后公开。