As the digitization of travel industry accelerates, analyzing and understanding travelers' behaviors becomes increasingly important. However, traveler data frequently exhibit high data sparsity due to the relatively low frequency of user interactions with travel providers. Compounding this effect the multiplication of devices, accounts and platforms while browsing travel products online also leads to data dispersion. To deal with these challenges, probabilistic traveler matching can be used. Most existing solutions for user matching are not suitable for traveler matching as a traveler's browsing history is typically short and URLs in the travel industry are very heterogeneous with many tokens. To deal with these challenges, we propose the similarity based multi-view information fusion to learn a better user representation from URLs by treating the URLs as multi-view data. The experimental results show that the proposed multi-view user representation learning can take advantage of the complementary information from different views, highlight the key information in URLs and perform significantly better than other representation learning solutions for the user matching task.
翻译:随着旅游行业数字化进程的加速,分析和理解旅客行为变得日益重要。然而,由于用户与旅游服务商的交互频率相对较低,旅客数据常呈现高度稀疏性。更复杂的是,用户在在线浏览旅游产品时设备、账户和平台的多重化进一步导致数据分散。为应对这些挑战,可采用概率性旅客匹配方法。现有的大多数用户匹配解决方案并不适用于旅客匹配场景,因为旅客的浏览历史通常较短,且旅游行业中的URL异构性强、包含大量标记。为解决这些问题,我们提出了基于相似度的多视角信息融合方法,通过将URL视为多视角数据来学习更优的用户表示。实验结果表明,所提出的多视角用户表示学习能够利用不同视角的互补信息,突出URL中的关键信息,并在用户匹配任务中显著优于其他表示学习方法。