Can Generative Video Models Help Pose Estimation?

Pairwise pose estimation from images with little or no overlap is an open challenge in computer vision. Existing methods, even those trained on large-scale datasets, struggle in these scenarios due to the lack of identifiable correspondences or visual overlap. Inspired by the human ability to infer spatial relationships from diverse scenes, we propose a novel approach, InterPose, that leverages the rich priors encoded within pre-trained generative video models. We propose to use a video model to hallucinate intermediate frames between two input images, effectively creating a dense, visual transition, which significantly simplifies the problem of pose estimation. Since current video models can still produce implausible motion or inconsistent geometry, we introduce a self-consistency score that evaluates the consistency of pose predictions from sampled videos. We demonstrate that our approach generalizes among three state-of-the-art video models and show consistent improvements over the state-of-the-art DUSt3R on four diverse datasets encompassing indoor, outdoor, and object-centric scenes. Our findings suggest a promising avenue for improving pose estimation models by leveraging large generative models trained on vast amounts of video data, which is more readily available than 3D data. See our project page for results: https://inter-pose.github.io/.

翻译：图像间重叠区域极少或无重叠时的成对姿态估计是计算机视觉领域一个悬而未决的挑战。现有方法，即便是在大规模数据集上训练的方法，由于缺乏可识别的对应关系或视觉重叠，在此类场景中仍面临困难。受人类从多样化场景推断空间关系能力的启发，我们提出了一种新颖方法 InterPose，该方法利用预训练生成式视频模型中编码的丰富先验知识。我们提出使用视频模型在两个输入图像之间生成中间帧，从而有效地创建密集的视觉过渡，这显著简化了姿态估计问题。鉴于当前视频模型仍可能产生不合理的运动或不一致的几何结构，我们引入了一种自一致性评分，用于评估从采样视频中获得的姿态预测的一致性。我们证明了我们的方法在三种先进视频模型中均具有泛化能力，并在涵盖室内、室外和以物体为中心场景的四个多样化数据集上，相较于当前最先进的DUSt3R方法展现出持续的性能提升。我们的研究结果表明，通过利用在大量视频数据（此类数据比3D数据更易获取）上训练的大型生成模型，为改进姿态估计模型提供了一条有前景的途径。结果请参见我们的项目页面：https://inter-pose.github.io/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日