Kinematic rigs provide a structured interface for articulating 3D meshes but lack any associated pose space, i.e., an explicit representation of the plausible manifold of joint configurations for a given mesh. Without such a pose space, stochastic sampling or manual manipulation of raw rig parameters easily results in semantic and/or geometric violations, such as anatomical hyperextension and non-physical self-intersections. We propose Video-informed Pose Spaces (ViPS), a feedforward framework that discovers the latent distribution of valid articulations for auto-rigged meshes by distilling motion priors from a pretrained video diffusion model. Unlike existing methods that rely on scarce, artist-authored 4D datasets, or focus on reconstructing instances of individual motions, ViPS transfers generative video model priors into a universal distribution over the given rig parameterization. Differentiable geometric validators applied to the skinned mesh enforce shape-specific integrity without requiring manual regularizers. Our feedforward model reveals a smooth, compact, and controllable pose space. This, in turn, supports sampling for diverse shape variations, manifold projection for inverse kinematics, and temporally coherent trajectories for animation and keyframing. Further, the distilled 3D pose samples serve as semantic proxies to guide video diffusion, effectively closing the loop between generative 2D priors and structured 3D kinematic control. Our evaluations show that ViPS, trained solely using video priors, matches the performance of state-of-the-art models trained on synthetic artist-created 4D data in both plausibility and diversity. Additionally, as a universal model, ViPS exhibits robust zero-shot generalization to out-of-distribution species and unseen skeletal topologies.
翻译:运动学绑定提供了用于表达三维网格的结构化接口,但缺乏对应的姿态空间,即对给定网格可行关节构型流形的显式表示。缺乏此类姿态空间时,原始绑定参数的随机采样或手动操作容易导致语义和/或几何违规,例如解剖学上的过度伸展和非物理的自相交。我们提出视频驱动的姿态空间(ViPS),一种前馈框架,通过从预训练视频扩散模型中提炼运动先验,发现自动绑定网格有效姿态的潜在分布。与依赖稀缺、艺术家创作的四维数据集或专注于重建个体运动实例的现有方法不同,ViPS将生成式视频模型先验转化为给定绑定参数化上的通用分布。应用于蒙皮网格的可微几何验证器在无需手动正则化器的情况下强制执行特定形状的完整性。我们的前馈模型揭示了一个平滑、紧凑且可控的姿态空间。这进而支持用于多样化形状变化的采样、逆运动学的流形投影,以及用于动画和关键帧的时序连贯轨迹。此外,提炼后的三维姿态样本作为语义代理来指导视频扩散,有效闭环了生成式二维先验与结构化三维运动控制之间的循环。我们的评估表明,仅使用视频先验训练的ViPS在合理性和多样性方面均与基于合成艺术家创作的四维数据训练的最先进模型性能相当。同时,作为一种通用模型,ViPS在分布外物种和未见骨骼拓扑上展现出稳健的零样本泛化能力。