Text-guided 3D motion editing has seen success in single-person scenarios, but its extension to multi-person settings is less explored due to limited paired data and the complexity of inter-person interactions. We introduce the task of multi-person 3D motion editing, where a target motion is generated from a source and a text instruction. To support this, we propose InterEdit3D, a new dataset with manual two-person motion change annotations, and a Text-guided Multi-human Motion Editing (TMME) benchmark. We present InterEdit, a synchronized classifier-free conditional diffusion model for TMME. It introduces Semantic-Aware Plan Token Alignment with learnable tokens to capture high-level interaction cues and an Interaction-Aware Frequency Token Alignment strategy using DCT and energy pooling to model periodic motion dynamics. Experiments show that InterEdit improves text-to-motion consistency and edit fidelity, achieving state-of-the-art TMME performance. The dataset and code will be released at https://github.com/YNG916/InterEdit.
翻译:文本引导的三维运动编辑已在单人体场景中取得成功,但其在多人体场景中的扩展因配对数据有限及人际交互的复杂性而较少被探索。我们提出了多人体三维运动编辑任务,即根据源运动与文本指令生成目标运动。为此,我们构建了InterEdit3D数据集(包含人工标注的双人体运动变化注释)与文本引导多人体运动编辑基准。我们提出了InterEdit——一种面向该任务的同步无分类器条件扩散模型。该模型通过引入语义感知规划令牌对齐(采用可学习令牌捕获高层交互线索)以及交互感知频率令牌对齐策略(利用DCT与能量池化建模周期性运动动态),显著提升了文本-运动一致性与编辑保真度,在多人体运动编辑任务上达到了最优性能。数据集与代码将在 https://github.com/YNG916/InterEdit 公开。