Predicting information cascade popularity is a fundamental problem in social networks. Capturing temporal attributes and cascade role information (e.g., cascade graphs and cascade sequences) is necessary for understanding the information cascade. Current methods rarely focus on unifying this information for popularity predictions, which prevents them from effectively modeling the full properties of cascades to achieve satisfactory prediction performances. In this paper, we propose an explicit Time embedding based Cascade Attention Network (TCAN) as a novel popularity prediction architecture for large-scale information networks. TCAN integrates temporal attributes (i.e., periodicity, linearity, and non-linear scaling) into node features via a general time embedding approach (TE), and then employs a cascade graph attention encoder (CGAT) and a cascade sequence attention encoder (CSAT) to fully learn the representation of cascade graphs and cascade sequences. We use two real-world datasets (i.e., Weibo and APS) with tens of thousands of cascade samples to validate our methods. Experimental results show that TCAN obtains mean logarithm squared errors of 2.007 and 1.201 and running times of 1.76 hours and 0.15 hours on both datasets, respectively. Furthermore, TCAN outperforms other representative baselines by 10.4%, 3.8%, and 10.4% in terms of MSLE, MAE, and R-squared on average while maintaining good interpretability.
翻译:预测信息级联流行度是社会网络中的一个基本问题。捕获时间属性和级联角色信息(例如,级联图和级联序列)对于理解信息级联至关重要。当前方法很少关注统一这些信息以进行流行度预测,这阻碍了它们有效建模级联的全部属性,从而难以达到令人满意的预测性能。在本文中,我们提出了一种基于显式时间嵌入的级联注意力网络(TCAN),作为一种适用于大规模信息网络的新型流行度预测架构。TCAN通过一种通用时间嵌入方法(TE)将时间属性(即周期性、线性及非线性缩放)融入节点特征,随后采用级联图注意力编码器(CGAT)和级联序列注意力编码器(CSAT)来充分学习级联图与级联序列的表示。我们使用了两个包含数万个级联样本的真实世界数据集(即微博和APS)来验证我们的方法。实验结果表明,TCAN在两个数据集上分别取得了2.007和1.201的平均对数平方误差,以及1.76小时和0.15小时的运行时间。此外,在平均MSLE、MAE和R平方指标上,TCAN相比其他代表性基线方法分别提升了10.4%、3.8%和10.4%,同时保持了良好的可解释性。