We present the results of training a large trajectory model using real-world user check-in data. Our approach follows a pre-train and fine-tune paradigm, where a base model is pre-trained via masked trajectory modeling and then adapted through fine-tuning for various downstream tasks. To address challenges posed by noisy data and large spatial vocabularies, we propose a novel spatial tokenization block. Our empirical analysis utilizes a comprehensive dataset of over 2 billion check-ins generated by more than 6 million users. Through fine-tuning on 3 downstream tasks we demonstrate that our base model has effectively learned valuable underlying patterns in raw data, enabling its application in meaningful trajectory intelligence tasks. Despite some limitations, we believe this work represents an important step forward in the realization of a foundation model for trajectory intelligence.
翻译:我们展示了使用真实世界用户签到数据训练大型轨迹模型的结果。我们的方法遵循预训练与微调范式,其中基础模型通过掩码轨迹建模进行预训练,随后通过微调适配多种下游任务。为应对噪声数据和庞大空间词汇带来的挑战,我们提出了一种新颖的空间标记化模块。实证分析中使用了包含超过20亿次签到、由600多万用户生成的综合数据集。通过在3项下游任务上的微调,我们证明基础模型已有效学习原始数据中的潜在模式,使其能够应用于有意义的轨迹智能任务。尽管存在一定局限性,我们相信这项工作在实现轨迹智能基础模型的道路上迈出了重要一步。