Despite the progress of learning-based methods for 6D object pose estimation, the trade-off between accuracy and scalability for novel objects still exists. Specifically, previous methods for novel objects do not make good use of the target object's 3D shape information since they focus on generalization by processing the shape indirectly, making them less effective. We present GenFlow, an approach that enables both accuracy and generalization to novel objects with the guidance of the target object's shape. Our method predicts optical flow between the rendered image and the observed image and refines the 6D pose iteratively. It boosts the performance by a constraint of the 3D shape and the generalizable geometric knowledge learned from an end-to-end differentiable system. We further improve our model by designing a cascade network architecture to exploit the multi-scale correlations and coarse-to-fine refinement. GenFlow ranked first on the unseen object pose estimation benchmarks in both the RGB and RGB-D cases. It also achieves performance competitive with existing state-of-the-art methods for the seen object pose estimation without any fine-tuning.
翻译:尽管基于学习的6D物体姿态估计方法取得了进展,但提升新物体的精度与可扩展性之间的权衡问题依然存在。具体而言,现有针对新物体的方法未能充分利用目标物体的三维形状信息,因为其通过间接处理形状来侧重泛化性,导致效果欠佳。我们提出GenFlow方法,该方法以目标物体形状为引导,兼顾精度与对新物体的泛化能力。该算法通过预测渲染图像与观测图像之间的光流,并迭代精化6D姿态。其性能提升得益于三维形状约束以及从端到端可微系统中习得的泛化几何知识。我们进一步通过设计级联网络架构来挖掘多尺度相关性并实现从粗到细的精化,从而改进模型。在RGB与RGB-D两种输入模式下,GenFlow在未见物体姿态估计基准测试中均排名第一。同时,在无需微调的情况下,该算法在可见物体姿态估计任务中取得了与现有最先进方法相媲美的性能。