Generative models often treat continuous data and discrete events as separate processes, creating a gap in modeling complex systems where they interact synchronously. To bridge this gap, we introduce JointDiff, a novel diffusion framework designed to unify these two processes by simultaneously generating continuous spatio-temporal data and synchronous discrete events. We demonstrate its efficacy in the sports domain by simultaneously modeling multi-agent trajectories and key possession events. This joint modeling is validated with non-controllable generation and two novel controllable generation scenarios: weak-possessor-guidance, which offers flexible semantic control over game dynamics through a simple list of intended ball possessors, and text-guidance, which enables fine-grained, language-driven generation. To enable the conditioning with these guidance signals, we introduce CrossGuid, an effective conditioning operation for multi-agent domains. We also share a new unified sports benchmark enhanced with textual descriptions for soccer and football datasets. JointDiff achieves state-of-the-art performance, demonstrating that joint modeling is crucial for building realistic and controllable generative models for interactive systems. https://guillem-cf.github.io/JointDiff/
翻译:生成模型通常将连续数据与离散事件视为独立的过程,这为建模两者同步交互的复杂系统造成了鸿沟。为弥合此鸿沟,我们提出了JointDiff,一种新颖的扩散框架,旨在通过同步生成连续时空数据与同步离散事件来统一这两个过程。我们通过在体育领域同步建模多智能体轨迹与关键控球事件,证明了其有效性。这种联合建模通过不可控生成与两种新颖的可控生成场景得到验证:弱控球者引导,它通过一个简单的预期持球者列表为比赛动态提供灵活的语义控制;以及文本引导,它实现了细粒度的、语言驱动的生成。为了支持这些引导信号的调节,我们引入了CrossGuid,一种针对多智能体领域的有效调节操作。我们还分享了一个新的统一体育基准测试集,该测试集为足球和美式足球数据集增强了文本描述。JointDiff实现了最先进的性能,证明了联合建模对于构建交互式系统的真实且可控的生成模型至关重要。https://guillem-cf.github.io/JointDiff/