The information diffusion prediction on social networks aims to predict future recipients of a message, with practical applications in marketing and social media. While different prediction models all claim to perform well, general frameworks for performance evaluation remain limited. Here, we aim to identify a performance characteristic curve for a model, which captures its performance on tasks of different complexity. We propose a metric based on information entropy to quantify the randomness in diffusion data, then identify a scaling pattern between the randomness and the prediction accuracy of the model. Data points in the patterns by different sequence lengths, system sizes, and randomness all collapse into a single curve, capturing a model's inherent capability of making correct predictions against increased uncertainty. Given that this curve has such important properties that it can be used to evaluate the model, we define it as the performance characteristic curve of the model. The validity of the curve is tested by three prediction models in the same family, reaching conclusions in line with existing studies. Also, the curve is successfully applied to evaluate two distinct models from the literature. Our work reveals a pattern underlying the data randomness and prediction accuracy. The performance characteristic curve provides a new way to systematically evaluate models' performance, and sheds light on future studies on other frameworks for model evaluation.
翻译:社交网络中的信息扩散预测旨在预测消息的未来接收者,在市场营销和社交媒体中具有实际应用价值。虽然不同预测模型均声称性能优越,但通用的性能评估框架仍然有限。本文旨在识别模型的性能特征曲线,该曲线能够捕捉模型在不同复杂度任务上的表现。我们提出一个基于信息熵的度量标准来量化扩散数据中的随机性,进而发现模型随机性与预测精度之间的标度规律。不同序列长度、系统规模和随机性下的数据点均塌缩至同一条曲线,这捕捉了模型在不确定性增加时做出正确预测的内在能力。鉴于该曲线具有可用于模型评估的重要特性,我们将其定义为模型的性能特征曲线。通过同一家族中的三个预测模型验证了该曲线的有效性,所得结论与现有研究一致。此外,该曲线成功应用于评估文献中的两个不同模型。我们的工作揭示了数据随机性与预测精度之间的规律,性能特征曲线为系统评估模型性能提供了新方法,并为未来关于其他模型评估框架的研究提供了启示。