Real-world robotics applications demand object pose estimation methods that work reliably across a variety of scenarios. Modern learning-based approaches require large labeled datasets and tend to perform poorly outside the training domain. Our first contribution is to develop a robust corrector module that corrects pose estimates using depth information, thus enabling existing methods to better generalize to new test domains; the corrector operates on semantic keypoints (but is also applicable to other pose estimators) and is fully differentiable. Our second contribution is an ensemble self-training approach that simultaneously trains multiple pose estimators in a self-supervised manner. Our ensemble self-training architecture uses the robust corrector to refine the output of each pose estimator; then, it evaluates the quality of the outputs using observable correctness certificates; finally, it uses the observably correct outputs for further training, without requiring external supervision. As an additional contribution, we propose small improvements to a regression-based keypoint detection architecture, to enhance its robustness to outliers; these improvements include a robust pooling scheme and a robust centroid computation. Experiments on the YCBV and TLESS datasets show the proposed ensemble self-training outperforms fully supervised baselines while not requiring 3D annotations on real data.
翻译:现实世界的机器人应用要求物体姿态估计方法能够在多种场景下可靠工作。现代基于学习的方法需要大量标注数据集,且在训练域之外往往表现不佳。本文的首要贡献是开发了一个鲁棒修正模块,利用深度信息修正姿态估计结果,从而帮助现有方法更好地泛化至新的测试域;该修正模块基于语义关键点运行(但也适用于其他姿态估计方法),且完全可微分。第二项贡献是一种集成自训练方法,能够以自监督方式同时训练多个姿态估计器。该集成自训练架构利用鲁棒修正模块优化每个姿态估计器的输出;随后通过可观测正确性证书评估输出质量;最终利用可观测的正确输出进行进一步训练,无需外部监督。作为附加贡献,我们提出了针对基于回归的关键点检测架构的小幅改进,以增强其对异常值的鲁棒性;这些改进包括鲁棒池化方案和鲁棒质心计算。在YCBV和TLESS数据集上的实验表明,所提出的集成自训练方法在无需真实数据3D标注的情况下,优于全监督基线方法。