This paper introduces PoseLess, a novel framework for robot hand control that eliminates the need for explicit pose estimation by directly mapping 2D images to joint angles using projected representations. Our approach leverages synthetic training data generated through randomized joint configurations, enabling zero-shot generalization to real-world scenarios and cross-morphology transfer from robotic to human hands. By projecting visual inputs and employing a transformer-based decoder, PoseLess achieves robust, low-latency control while addressing challenges such as depth ambiguity and data scarcity. Experimental results demonstrate competitive performance in joint angle prediction accuracy without relying on any human-labelled dataset.
翻译:本文提出PoseLess——一种创新的机器人手部控制框架,通过投影表征将二维图像直接映射至关节角度,从而消除显式姿态估计的需求。该方法利用随机关节配置生成的合成训练数据,实现了对真实场景的零样本泛化及从机械手到人手的跨形态迁移。通过投影视觉输入并采用基于Transformer的解码器,PoseLess在应对深度歧义与数据稀缺等挑战的同时,实现了鲁棒的低延迟控制。实验结果表明,该方法在不依赖任何人工标注数据集的情况下,取得了具有竞争力的关节角度预测精度。