This paper introduces PoseLess, a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using tokenized representations. Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from robotic to human hands. By tokenizing visual inputs and employing a transformer-based decoder, PoseLess achieves robust, low-latency control while addressing challenges such as depth ambiguity and data scarcity. Experimental results demonstrate competitive performance in joint angle prediction accuracy without relying on any human-labelled dataset.
翻译:本文提出PoseLess——一种创新的机器人手部控制框架,通过基于令牌化表征的二维图像到关节角度的直接映射,消除了显式姿态估计的需求。该方法利用随机关节配置生成的合成训练数据,实现了对真实场景的零样本泛化以及从机器人手到人手的跨形态迁移。通过视觉输入的令牌化处理与基于Transformer的解码器架构,PoseLess在应对深度歧义和数据稀缺等挑战的同时,实现了鲁棒的低延迟控制。实验结果表明,在不依赖任何人工标注数据集的情况下,该方法在关节角度预测精度方面展现出具有竞争力的性能。