Trajectory representation learning (TRL) maps trajectories to vectors that can be used for many downstream tasks. Existing TRL methods use either grid trajectories, capturing movement in free space, or road trajectories, capturing movement in a road network, as input. We observe that the two types of trajectories are complementary, providing either region and location information or providing road structure and movement regularity. Therefore, we propose a novel multimodal TRL method, dubbed GREEN, to jointly utilize Grid and Road trajectory Expressions for Effective representatioN learning. In particular, we transform raw GPS trajectories into both grid and road trajectories and tailor two encoders to capture their respective information. To align the two encoders such that they complement each other, we adopt a contrastive loss to encourage them to produce similar embeddings for the same raw trajectory and design a mask language model (MLM) loss to use grid trajectories to help reconstruct masked road trajectories. To learn the final trajectory representation, a dual-modal interactor is used to fuse the outputs of the two encoders via cross-attention. We compare GREEN with 7 state-of-the-art TRL methods for 3 downstream tasks, finding that GREEN consistently outperforms all baselines and improves the accuracy of the best-performing baseline by an average of 15.99\%.
翻译:轨迹表示学习(TRL)将轨迹映射为可用于多种下游任务的向量。现有TRL方法或采用捕捉自由空间移动的网格轨迹,或采用捕捉路网移动的道路轨迹作为输入。我们观察到这两类轨迹具有互补性:网格轨迹提供区域与位置信息,而道路轨迹提供路网结构与移动规律性。为此,我们提出一种新颖的多模态TRL方法GREEN(Grid and Road trajectory Expressions for Effective representatioN learning),通过联合利用网格与道路轨迹表达实现高效表示学习。具体而言,我们将原始GPS轨迹同时转换为网格轨迹与道路轨迹,并定制两个编码器以分别捕获其信息。为实现两个编码器的互补对齐,我们采用对比损失促使它们对同一原始轨迹生成相似嵌入,并设计掩码语言模型(MLM)损失使网格轨迹辅助重建被掩码的道路轨迹。为学习最终轨迹表示,我们通过双模态交互器利用交叉注意力融合两个编码器的输出。我们将GREEN与7种前沿TRL方法在3个下游任务中进行比较,发现GREEN始终优于所有基线方法,将最佳基线的准确率平均提升15.99%。