Mobile app usage behavior reveals human patterns and is crucial for stakeholders, but data collection is costly and raises privacy issues. Data synthesis can address this by generating artificial datasets that mirror real-world data. In this paper, we propose AppGen, an autoregressive generative model designed to generate app usage behavior based on users' mobility trajectories, improving dataset accessibility and quality. Specifically, AppGen employs a probabilistic diffusion model to simulate the stochastic nature of app usage behavior. By utilizing an autoregressive structure, AppGen effectively captures the intricate sequential relationships between different app usage events. Additionally, AppGen leverages latent encoding to extract semantic features from spatio-temporal points, guiding behavior generation. These key designs ensure the generated behaviors are contextually relevant and faithfully represent users' environments and past interactions. Experiments with two real-world datasets show that AppGen outperforms state-of-the-art baselines by over 12% in critical metrics and accurately reflects real-world spatio-temporal patterns. We also test the generated datasets in applications, demonstrating their suitability for downstream tasks by maintaining algorithm accuracy and order.
翻译:移动应用使用行为揭示了人类行为模式,对相关利益方至关重要,但数据收集成本高昂且引发隐私问题。数据合成可通过生成模拟真实世界数据的人工数据集来解决此问题。本文提出AppGen,一种自回归生成模型,旨在基于用户移动轨迹生成应用使用行为,以提升数据集的可访问性与质量。具体而言,AppGen采用概率扩散模型来模拟应用使用行为的随机性。通过自回归结构,AppGen有效捕捉了不同应用使用事件间复杂的序列关系。此外,AppGen利用潜在编码从时空点中提取语义特征,以指导行为生成。这些关键设计确保生成的行为具有上下文相关性,并真实反映用户所处环境及历史交互。在两个真实数据集上的实验表明,AppGen在关键指标上优于现有基线模型超过12%,并能准确反映真实时空模式。我们还在实际应用中测试了生成的数据集,通过保持算法精度与顺序,证明了其适用于下游任务。