Real-world robotics applications demand object pose estimation methods that work reliably across a variety of scenarios. Modern learning-based approaches require large labeled datasets and tend to perform poorly outside the training domain. Our first contribution is to develop a robust corrector module that corrects pose estimates using depth information, thus enabling existing methods to better generalize to new test domains; the corrector operates on semantic keypoints (but is also applicable to other pose estimators) and is fully differentiable. Our second contribution is an ensemble self-training approach that simultaneously trains multiple pose estimators in a self-supervised manner. Our ensemble self-training architecture uses the robust corrector to refine the output of each pose estimator; then, it evaluates the quality of the outputs using observable correctness certificates; finally, it uses the observably correct outputs for further training, without requiring external supervision. As an additional contribution, we propose small improvements to a regression-based keypoint detection architecture, to enhance its robustness to outliers; these improvements include a robust pooling scheme and a robust centroid computation. Experiments on the YCBV and TLESS datasets show the proposed ensemble self-training outperforms fully supervised baselines while not requiring 3D annotations on real data.
翻译:实际机器人应用需要能够在多种场景下可靠运行的物体姿态估计方法。现代基于学习的方法需要大量标注数据集,且在训练域之外往往表现不佳。我们的第一项贡献是开发了一个鲁棒的校正模块,利用深度信息校正姿态估计,从而使现有方法更好地泛化到新的测试域;该校正器对语义关键点(但也适用于其他姿态估计器)进行操作,且是完全可微的。第二项贡献是一种集成自训练方法,能够以自我监督方式同时训练多个姿态估计器。我们的集成自训练架构使用鲁棒校正器优化每个姿态估计器的输出;然后,利用可观测的正确性证书评估输出质量;最后,使用这些可观测的正确输出进行进一步训练,无需外部监督。作为额外贡献,我们对基于回归的关键点检测架构提出了微小改进,以增强其对异常值的鲁棒性;这些改进包括鲁棒池化方案和鲁棒质心计算方法。在YCBV和TLESS数据集上的实验表明,所提出的集成自训练方法优于全监督基线方法,同时无需真实数据上的3D标注。