Accurate travel time estimation (TTE) plays a crucial role in intelligent transportation systems. However, it remains challenging due to heterogeneous data sources and complex traffic dynamics. Moreover, traditional approaches typically convert trajectory data into fixed-length representations. This overlooks the inherent variability of real-world motion patterns, often resulting in information loss and redundancy. To address these challenges, this paper introduces the Multimodal Dynamic Trajectory Integration (MDTI) framework--a novel multimodal trajectory representation learning approach that integrates GPS sequences, grid trajectories, and road network constraints to enhance the performance of TTE. MDTI employs modality-specific encoders and a multimodal fusion module to capture complementary spatial, temporal, and topological semantics, while a dynamic trajectory modeling mechanism adaptively regulates information density for trajectories of varying lengths. Two self-supervised pretraining objectives, named contrastive alignment and masked language modeling, further strengthen multimodal consistency and contextual understanding. Extensive experiments on three real-world datasets demonstrate that MDTI consistently outperforms state-of-the-art baselines, confirming its robustness and strong generalization abilities. The code is publicly available at: https://github.com/City-Computing/MDTI.
翻译:准确的行程时间估计在智能交通系统中起着至关重要的作用。然而,由于异构数据源和复杂的交通动态,这仍然是一个具有挑战性的问题。此外,传统方法通常将轨迹数据转换为固定长度的表示。这种做法忽略了现实世界运动模式固有的可变性,往往导致信息损失和冗余。为了应对这些挑战,本文提出了多模态动态轨迹集成框架——一种新颖的多模态轨迹表征学习方法,它集成了GPS序列、网格轨迹和道路网络约束,以提升行程时间估计的性能。MDTI采用模态特定的编码器和一个多模态融合模块来捕捉互补的空间、时间和拓扑语义,同时通过动态轨迹建模机制自适应地调节不同长度轨迹的信息密度。两个自监督预训练目标,即对比对齐和掩码语言建模,进一步增强了多模态一致性和上下文理解能力。在三个真实世界数据集上进行的大量实验表明,MDTI始终优于最先进的基线方法,证实了其鲁棒性和强大的泛化能力。代码公开于:https://github.com/City-Computing/MDTI。