Diffusion probabilistic models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data, for instance, for computer vision, audio, natural language processing, or biomolecule generation. Here, we propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals. ILTs are of major importance in mobility research to understand the mobility behavior of populations and to ultimately inform political decision-making. We represent ILTs as multi-dimensional categorical random variables and propose to model their joint distribution using a continuous DPM by first applying the diffusion process in a continuous unconstrained space and then mapping the continuous variables into a discrete space. We demonstrate that our model can synthesize realistic ILPs by comparing conditionally and unconditionally generated sequences to real-world ILPs from a GNSS tracking data set which suggests the potential use of our model for synthetic data generation, for example, for benchmarking models used in mobility research.
翻译:扩散概率模型(DPMs)已迅速发展为合成数据模拟的主要生成模型之一,例如在计算机视觉、音频、自然语言处理或生物分子生成领域。本文提出利用DPMs生成合成个体位置轨迹(ILTs),该轨迹为由个体访问的物理位置变量构成的序列。ILTs在流动性研究中至关重要,有助于理解人群的移动行为,进而为政治决策提供信息。我们将ILTs表示为多维类别随机变量,并提出通过连续DPM对其联合分布进行建模:首先在连续无约束空间中应用扩散过程,随后将连续变量映射到离散空间。通过将条件生成与无条件生成序列与全球导航卫星系统(GNSS)追踪数据集的真实ILPs进行对比,我们证明该模型能够合成逼真的ILPs,这表明其在合成数据生成中的潜在应用价值,例如用于流动性研究中的基准模型评估。