Trajectory representation learning on a network enhances our understanding of vehicular traffic patterns and benefits numerous downstream applications. Existing approaches using classic machine learning or deep learning embed trajectories as dense vectors, which lack interpretability and are inefficient to store and analyze in downstream tasks. In this paper, an explainable trajectory representation learning framework through dictionary learning is proposed. Given a collection of trajectories on a network, it extracts a compact dictionary of commonly used subpaths called "pathlets", which optimally reconstruct each trajectory by simple concatenations. The resulting representation is naturally sparse and encodes strong spatial semantics. Theoretical analysis of our proposed algorithm is conducted to provide a probabilistic bound on the estimation error of the optimal dictionary. A hierarchical dictionary learning scheme is also proposed to ensure the algorithm's scalability on large networks, leading to a multi-scale trajectory representation. Our framework is evaluated on two large-scale real-world taxi datasets. Compared to previous work, the dictionary learned by our method is more compact and has better reconstruction rate for new trajectories. We also demonstrate the promising performance of this method in downstream tasks including trip time prediction task and data compression.
翻译:在路网上的轨迹表示学习增强了对车辆交通模式的理解,并有利于众多下游应用。现有方法利用经典机器学习或深度学习将轨迹嵌入为密集向量,这些向量缺乏可解释性,并且在下游任务中存储和分析效率低下。本文提出了一种通过字典学习的可解释轨迹表示学习框架。给定路网上的轨迹集合,该框架提取一个由常用子路径(称为“pathlets”)组成的紧凑字典,并通过简单拼接来最优地重建每条轨迹。由此得到的表示自然稀疏,并编码了强大的空间语义。我们对所提算法进行了理论分析,以提供最优字典估计误差的概率界。同时提出了一种分层字典学习方案,以确保算法在大规模路网上的可扩展性,从而得到多尺度轨迹表示。我们的框架在两个大规模真实出租车数据集上进行了评估。与先前工作相比,通过我们方法学习的字典更紧凑,并且对新轨迹具有更好的重建率。我们还展示了该方法在下游任务(包括行程时间预测任务和数据压缩)中的良好性能。