4D generation has made remarkable progress in synthesizing dynamic 3D objects from input text, images, or videos. However, existing methods often represent motion as an implicit deformation field, which limits direct control and editability. To address this issue, we propose SkeletonGaussian, a novel framework for generating editable dynamic 3D Gaussians from monocular video input. Our approach introduces a hierarchical articulated representation that decomposes motion into sparse rigid motion explicitly driven by a skeleton and fine-grained non-rigid motion. Concretely, we extract a robust skeleton and drive rigid motion via linear blend skinning, followed by a hexplane-based refinement for non-rigid deformations, enhancing interpretability and editability. Experimental results demonstrate that SkeletonGaussian surpasses existing methods in generation quality while enabling intuitive motion editing, establishing a new paradigm for editable 4D generation. Project page: https://wusar.github.io/projects/skeletongaussian/
翻译:四维生成在从输入文本、图像或视频合成动态三维物体方面取得了显著进展。然而,现有方法通常将运动表示为隐式变形场,这限制了对运动的直接控制与可编辑性。为解决此问题,我们提出了SkeletonGaussian,一个从单目视频输入生成可编辑动态三维高斯模型的新颖框架。我们的方法引入了一种层次化关节表示,将运动分解为由骨架显式驱动的稀疏刚性运动与细粒度非刚性运动。具体而言,我们提取鲁棒的骨架并通过线性混合蒙皮驱动刚性运动,随后采用基于六面体平面的细化处理非刚性形变,从而增强了可解释性与可编辑性。实验结果表明,SkeletonGaussian在生成质量上超越了现有方法,同时支持直观的运动编辑,为可编辑四维生成建立了新的范式。项目页面:https://wusar.github.io/projects/skeletongaussian/