In this paper, we introduce neural texture learning for 6D object pose estimation from synthetic data and a few unlabelled real images. Our major contribution is a novel learning scheme which removes the drawbacks of previous works, namely the strong dependency on co-modalities or additional refinement. These have been previously necessary to provide training signals for convergence. We formulate such a scheme as two sub-optimisation problems on texture learning and pose learning. We separately learn to predict realistic texture of objects from real image collections and learn pose estimation from pixel-perfect synthetic data. Combining these two capabilities allows then to synthesise photorealistic novel views to supervise the pose estimator with accurate geometry. To alleviate pose noise and segmentation imperfection present during the texture learning phase, we propose a surfel-based adversarial training loss together with texture regularisation from synthetic data. We demonstrate that the proposed approach significantly outperforms the recent state-of-the-art methods without ground-truth pose annotations and demonstrates substantial generalisation improvements towards unseen scenes. Remarkably, our scheme improves the adopted pose estimators substantially even when initialised with much inferior performance.
翻译:本文提出一种基于神经纹理学习的6D物体姿态估计方法,仅需合成数据与少量无标注真实图像。核心贡献在于设计了一种新型学习框架,消除了先前方法对模态协同或额外精化步骤的强依赖性——这些模块之前是提供训练信号以实现收敛的必要条件。我们将该框架分解为纹理学习与姿态学习两个子优化问题:从真实图像集合中独立学习物体逼真纹理预测,从像素完美的合成数据中学习姿态估计。这两种能力的结合使我们能合成具有真实感的新视角图像,从而通过精确几何监督姿态估计器。为缓解纹理学习阶段存在的姿态噪声与分割不完美问题,我们提出了基于面元的对抗训练损失,并引入合成数据的纹理正则化。实验表明,本方法在无真实姿态标注条件下显著优于最新技术,并在未见场景中展现出强大的泛化能力。值得关注的是,即使初始性能较差,该方案仍能大幅提升所采用姿态估计器的性能。