RePOSE: Real-Time Iterative Rendering and Refinement for 6D Object Pose Estimation

The use of iterative pose refinement is a critical processing step for 6D object pose estimation, and its performance depends greatly on one's choice of image representation. Image representations learned via deep convolutional neural networks (CNN) are currently the method of choice as they are able to robustly encode object keypoint locations. However, CNN-based image representations are computational expensive to use for iterative pose refinement, as they require that image features are extracted using a deep network, once for the input image and multiple times for rendered images during the refinement process. Instead of using a CNN to extract image features from a rendered RGB image, we propose to directly render a deep feature image. We call this deep texture rendering, where a shallow multi-layer perceptron is used to directly regress a view invariant image representation of an object. Using an estimate of the pose and deep texture rendering, our system can render an image representation in under 1ms. This image representation is optimized such that it makes it easier to perform nonlinear 6D pose estimation by adding a differentiable Levenberg-Marquardt optimization network and back-propagating the 6D pose alignment error. We call our method, RePOSE, a Real-time Iterative Rendering and Refinement algorithm for 6D POSE estimation. RePOSE runs at 71 FPS and achieves state-of-the-art accuracy of 51.6% on the Occlusion LineMOD dataset - a 4.1% absolute improvement over the prior art, and comparable performance on the YCB-Video dataset with a much faster runtime than the other pose refinement methods.

翻译：对 6D 对象的迭代变形变形改进是 6D 对象进行估计的关键处理步骤, 其性能在很大程度上取决于一个人对图像的表示方式。通过深层神经神经神经网络(CNN) 学习的图像表示方式目前是选择的方法, 因为它们能够对对象关键点位置进行严格的编码。然而, CNN 的图像表示方式在计算上成本很高, 用于迭代变变形改进, 因为它们要求利用深网络提取图像, 一旦输入图像, 并在改进过程中多次提取图像。我们提议使用CNNPN 来从已变 RGB 图像中提取图像特征特征, 而不是直接制作一个深层的图像图像表示方式。我们称之为深层纹度色神经神经网络, 使用浅层多层显示的图像显示显示器图像显示在 1 米以下的图像表示方式。这种图像表示方式最优化, 通过添加不同的 Levenberg- Mard 优化网络的图像网络和后置 D 精度精度显示 6PO 之前的 RPO 的精确, 进行校正校正校正校正校正校正校正校正校正校对校正校正校正校正校正校正校正校正校正校校校校校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对校对