We present MixRI, a lightweight network that solves the CAD-based novel object pose estimation problem in RGB images. It can be instantly applied to a novel object at test time without finetuning. We design our network to meet the demands of real-world applications, emphasizing reduced memory requirements and fast inference time. Unlike existing works that utilize many reference images and have large network parameters, we directly match points based on the multi-view information between the query and reference images with a lightweight network. Thanks to our reference image fusion strategy, we significantly decrease the number of reference images, thus decreasing the time needed to process these images and the memory required to store them. Furthermore, with our lightweight network, our method requires less inference time. Though with fewer reference images, experiments on seven core datasets in the BOP challenge show that our method achieves comparable results with other methods that require more reference images and larger network parameters.
翻译:本文提出MixRI,一种轻量级网络,用于解决基于CAD模型的RGB图像中新物体姿态估计问题。该网络无需微调即可在测试时直接应用于新物体。我们设计的网络满足实际应用需求,重点降低内存需求并提升推理速度。与现有方法使用大量参考图像且网络参数量庞大不同,我们通过轻量级网络直接基于查询图像与参考图像间的多视角信息进行点匹配。得益于参考图像融合策略,我们显著减少了参考图像数量,从而降低了图像处理时间与存储需求。此外,轻量级网络设计使我们的方法具有更快的推理速度。尽管使用更少的参考图像,在BOP挑战赛七个核心数据集上的实验表明,本方法取得了与那些需要更多参考图像和更大网络参数的方法相当的结果。