Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning

Human mobility is a fundamental pillar of urban science and sustainability, providing critical insights into energy consumption, carbon emissions, and public health. However, the discovery of universal mobility laws is currently hindered by the ``data silo'' problem, where institutional boundaries and privacy regulations fragment the necessary large-scale datasets. In this paper, we propose MoveGCL, a transformative framework that facilitates collaborative and decentralized mobility science via generative continual learning. MoveGCL enables a distributed ecosystem of data holders to jointly evolve a foundation model without compromising individual privacy. The core of MoveGCL lies in its ability to replay synthetic trajectories derived from a generative teacher and utilize a mobility-pattern-aware Mixture-of-Experts (MoE) architecture. This allows the model to encapsulate the unique characteristics of diverse urban structures while mitigating the risk of knowledge erosion (catastrophic forgetting). With a specialized layer-wise progressive adaptation strategy, MoveGCL ensures stable convergence during the continuous integration of new urban domains. Our experiments on six global urban datasets demonstrate that MoveGCL achieves performance parity with joint training, a previously unattainable feat under siloed conditions. This work provides a scalable, privacy-preserving pathway toward Open Mobility Science, empowering researchers to address global sustainability challenges through cross-institutional AI collaboration. To facilitate reproducibility and future research, we have released the code and models at \color{blue}{https://github.com/tsinghua-fib-lab/MoveGCL}.

翻译：人类移动性是城市科学与可持续发展的核心支柱，为理解能源消耗、碳排放和公共健康提供了关键洞见。然而，通用移动规律的发现目前受到“数据孤岛”问题的阻碍——制度边界与隐私法规割裂了所需的大规模数据集。本文提出MoveGCL框架，通过生成式持续学习实现协同去中心化的移动性科学研究。该框架使数据持有者能在不损害个体隐私的前提下，共同演进一个基础模型。MoveGCL的核心在于其能够回放来自生成式教师的合成轨迹，并采用移动模式感知的混合专家架构。这种设计使模型既能封装不同城市结构的独特特征，又能缓解知识侵蚀风险。通过专门的层级渐进适应策略，MoveGCL在持续整合新城市域时确保稳定收敛。我们在六个全球城市数据集上的实验表明，MoveGCL达到了联合训练的性能水平，这是在数据孤岛条件下此前无法实现的成就。这项工作为开放移动科学提供了可扩展且隐私保护的路径，使研究者能够通过跨机构AI协作应对全球可持续发展挑战。为促进可复现性与未来研究，我们已在\color{blue}{https://github.com/tsinghua-fib-lab/MoveGCL}公开代码与模型。