Motion transfer has emerged as a promising direction for controllable video generation, yet existing methods largely focus on single-object scenarios and struggle when multiple objects require distinct motion patterns. In this work, we present FlexiMMT, the first implicit image-to-video (I2V) motion transfer framework that explicitly enables multi-object, multi-motion transfer. Given a static multi-object image and multiple reference videos, FlexiMMT independently extracts motion representations and accurately assigns them to different objects, supporting flexible recombination and arbitrary motion-to-object mappings. To address the core challenge of cross-object motion entanglement, we introduce a Motion Decoupled Mask Attention Mechanism that uses object-specific masks to constrain attention, ensuring that motion and text tokens only influence their designated regions. We further propose a Differentiated Mask Propagation Mechanism that derives object-specific masks directly from diffusion attention and progressively propagates them across frames efficiently. Extensive experiments demonstrate that FlexiMMT achieves precise, compositional, and state-of-the-art performance in I2V-based multi-object multi-motion transfer. Our project page is: https://ethan-li123.github.io/FlexiMMT_page/
翻译:运动迁移已成为可控视频生成领域一个前景广阔的方向,但现有方法主要集中于单物体场景,当多个物体需要不同运动模式时则难以应对。在本工作中,我们提出了FlexiMMT,首个显式支持多物体、多运动迁移的隐式图像到视频(I2V)运动迁移框架。给定一张静态多物体图像和多个参考视频,FlexiMMT能够独立提取运动表征并精确地将其分配给不同物体,支持灵活的重组和任意的运动-物体映射。为解决跨物体运动纠缠这一核心挑战,我们引入了运动解耦掩码注意力机制,该机制利用物体特定掩码来约束注意力,确保运动令牌和文本令牌仅影响其指定区域。我们进一步提出了差异化掩码传播机制,该机制直接从扩散注意力中推导出物体特定掩码,并高效地将其在帧间渐进传播。大量实验表明,FlexiMMT在基于I2V的多物体多运动迁移任务中实现了精确、组合式且最先进的性能。我们的项目页面是:https://ethan-li123.github.io/FlexiMMT_page/