ViPS: Video-informed Pose Spaces for Auto-Rigged Meshes

Kinematic rigs provide a structured interface for articulating 3D meshes but lack any associated pose space, i.e., an explicit representation of the plausible manifold of joint configurations for a given mesh. Without such a pose space, stochastic sampling or manual manipulation of raw rig parameters easily results in semantic and/or geometric violations, such as anatomical hyperextension and non-physical self-intersections. We propose Video-informed Pose Spaces (ViPS), a feedforward framework that discovers the latent distribution of valid articulations for auto-rigged meshes by distilling motion priors from a pretrained video diffusion model. Unlike existing methods that rely on scarce, artist-authored 4D datasets, or focus on reconstructing instances of individual motions, ViPS transfers generative video model priors into a universal distribution over the given rig parameterization. Differentiable geometric validators applied to the skinned mesh enforce shape-specific integrity without requiring manual regularizers. Our feedforward model reveals a smooth, compact, and controllable pose space. This, in turn, supports sampling for diverse shape variations, manifold projection for inverse kinematics, and temporally coherent trajectories for animation and keyframing. Further, the distilled 3D pose samples serve as semantic proxies to guide video diffusion, effectively closing the loop between generative 2D priors and structured 3D kinematic control. Our evaluations show that ViPS, trained solely using video priors, matches the performance of state-of-the-art models trained on synthetic artist-created 4D data in both plausibility and diversity. Additionally, as a universal model, ViPS exhibits robust zero-shot generalization to out-of-distribution species and unseen skeletal topologies.

翻译：运动学绑定提供了用于表达三维网格的结构化接口，但缺乏对应的姿态空间，即对给定网格可行关节构型流形的显式表示。缺乏此类姿态空间时，原始绑定参数的随机采样或手动操作容易导致语义和/或几何违规，例如解剖学上的过度伸展和非物理的自相交。我们提出视频驱动的姿态空间（ViPS），一种前馈框架，通过从预训练视频扩散模型中提炼运动先验，发现自动绑定网格有效姿态的潜在分布。与依赖稀缺、艺术家创作的四维数据集或专注于重建个体运动实例的现有方法不同，ViPS将生成式视频模型先验转化为给定绑定参数化上的通用分布。应用于蒙皮网格的可微几何验证器在无需手动正则化器的情况下强制执行特定形状的完整性。我们的前馈模型揭示了一个平滑、紧凑且可控的姿态空间。这进而支持用于多样化形状变化的采样、逆运动学的流形投影，以及用于动画和关键帧的时序连贯轨迹。此外，提炼后的三维姿态样本作为语义代理来指导视频扩散，有效闭环了生成式二维先验与结构化三维运动控制之间的循环。我们的评估表明，仅使用视频先验训练的ViPS在合理性和多样性方面均与基于合成艺术家创作的四维数据训练的最先进模型性能相当。同时，作为一种通用模型，ViPS在分布外物种和未见骨骼拓扑上展现出稳健的零样本泛化能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR2025】《VideoMage：文本到视频扩散模型的多主体与运动定制》

专知会员服务

12+阅读 · 2025年3月28日

【CVPR2024】GroupContrast：语义感知的自监督表示学习用于三维理解

专知会员服务

18+阅读 · 2024年3月15日

【博士论文】自监督学习视频表示

专知会员服务

26+阅读 · 2024年2月5日

【ICML2022】时序自监督视频transformer

专知会员服务

31+阅读 · 2022年7月28日