Robotic assembly tasks require object-pose estimation, particularly for tasks that avoid costly mechanical constraints. Object symmetry complicates the direct mapping of sensory input to object rotation, as the rotation becomes ambiguous and lacks a unique training target. Some proposed solutions involve evaluating multiple pose hypotheses against the input or predicting a probability distribution, but these approaches suffer from significant computational overhead. Here, we show that representing object rotation with a neural population code overcomes these limitations, enabling a direct mapping to rotation and end-to-end learning. As a result, population codes facilitate fast and accurate pose estimation. On the T-LESS dataset, we achieve inference in 3.2 milliseconds on an Apple M1 CPU and a Maximum Symmetry-Aware Surface Distance accuracy of 84.7% using only gray-scale image input, compared to 69.7% accuracy when directly mapping to pose.
翻译:机器人装配任务需要物体姿态估计,尤其对于避免高成本机械约束的任务至关重要。物体对称性使得从感官输入到物体旋转的直接映射变得复杂,因为旋转变得模糊且缺乏唯一的训练目标。现有解决方案包括针对输入评估多个姿态假设或预测概率分布,但这些方法存在显著的计算开销。本文证明,使用神经群体编码表示物体旋转能够克服这些限制,实现到旋转的直接映射和端到端学习。因此,群体编码能够实现快速准确的姿态估计。在T-LESS数据集上,我们仅使用灰度图像输入,在Apple M1 CPU上实现了3.2毫秒的推理速度,并获得了84.7%的最大对称感知表面距离精度,而直接映射到姿态的精度仅为69.7%。