The information diffusion prediction on social networks aims to predict future recipients of a message, with practical applications in marketing and social media. While different prediction models all claim to perform well, general frameworks for performance evaluation remain limited. Here, we aim to identify a performance characteristic curve for a model, which captures its performance on tasks of different complexity. We propose a metric based on information entropy to quantify the randomness in diffusion data. We then identify a scaling pattern between the randomness and the prediction accuracy of the model. By properly adjusting the variables, data points by different sequence lengths, system sizes, and randomness can all collapse into a single curve. The curve captures a model's inherent capability of making correct predictions against increased uncertainty, which we regard as the performance characteristic curve of the model. The validity of the curve is tested by three prediction models in the same family, reaching conclusions in line with existing studies. In addition, we apply the curve to successfully assess the performance of eight state-of-the-art models, providing a clear and comprehensive evaluation even for models that are challenging to differentiate with conventional metrics. Our work reveals a pattern underlying the data randomness and prediction accuracy. The performance characteristic curve provides a new way to evaluate models' performance systematically, and sheds light on future studies on other frameworks for model evaluation.
翻译:社交网络中的信息扩散预测旨在预测消息的未来接收者,在市场营销和社交媒体中具有实际应用价值。尽管不同的预测模型均声称性能优异,但通用的性能评估框架仍然有限。本文旨在识别模型的性能特征曲线,该曲线能够捕捉模型在不同复杂度任务上的表现。我们提出一种基于信息熵的度量方法,用于量化扩散数据中的随机性。随后,我们发现模型预测准确度与数据随机性之间存在标度规律。通过适当调整变量,不同序列长度、系统规模和随机性对应的数据点均可坍缩为单一曲线。该曲线刻画了模型在不确定性增加时做出正确预测的内在能力,我们将其视为模型的性能特征曲线。我们通过同一家族的三种预测模型验证了该曲线的有效性,所得结论与现有研究一致。此外,我们应用该曲线成功评估了八种前沿模型的性能,即使对于传统指标难以区分的模型,也能提供清晰全面的评估。本研究揭示了数据随机性与预测准确度之间的内在规律。性能特征曲线为系统评估模型性能提供了新方法,并为未来开发其他模型评估框架的研究提供了启示。