We present Sketch2Colab, which turns storyboard-style 2D sketches into coherent, object-aware 3D multi-human motion with fine-grained control over agents, joints, timing, and contacts. Conventional diffusion-based motion generators have advanced realism; however, achieving precise adherence to rich interaction constraints typically demands extensive training and/or costly posterior guidance, and performance can degrade under strong multi-entity conditioning. Sketch2Colab instead first learns a sketch-driven diffusion prior and then distills it into an efficient rectified-flow student operating in latent space for fast, stable sampling. Differentiable energies over keyframes, trajectories, and physics-based constraints directly shape the student's transport field, steering samples toward motions that faithfully satisfy the storyboard while remaining physically plausible. To capture coordinated interaction, we augment the continuous flow with a continuous-time Markov chain (CTMC) planner that schedules discrete events such as touches, grasps, and handoffs, modulating the dynamics to produce crisp, well-phased human-object-human collaborations. Experiments on CORE4D and InterHuman show that Sketch2Colab achieves state-of-the-art constraint adherence and perceptual quality while offering significantly faster inference than diffusion-only baselines.
翻译:本文提出Sketch2Colab,该系统能将故事板风格的二维草图转化为连贯的、具有对象感知的三维多人体运动,并实现对智能体、关节、时序及接触点的细粒度控制。传统的基于扩散模型的运动生成器在真实感方面已取得进展;然而,要实现对丰富交互约束的精确遵循,通常需要大量训练和/或昂贵的后验引导,且在强多实体条件下性能可能下降。Sketch2Colab则首先学习草图驱动的扩散先验,随后将其蒸馏至在潜空间运行的高效整流流学生模型中,以实现快速稳定的采样。基于关键帧、轨迹和物理约束的可微能量函数直接塑造学生模型的传输场,引导采样生成的运动在忠实满足故事板要求的同时保持物理合理性。为捕捉协调交互,我们在连续流中引入了连续时间马尔可夫链规划器,用于调度触摸、抓取、交接等离散事件,通过动态调制生成清晰且相位协调的人-物-人协作动作。在CORE4D和InterHuman数据集上的实验表明,Sketch2Colab在约束遵循度和感知质量方面达到最先进水平,同时推理速度显著快于纯扩散基线方法。