The information diffusion prediction on social networks aims to predict future recipients of a message, with practical applications in marketing and social media. While different prediction models all claim to perform well, general frameworks for performance evaluation remain limited. Here, we aim to identify a performance characteristic curve for a model, which captures its performance on tasks of different complexity. We propose a metric based on information entropy to quantify the randomness in diffusion data, then identify a scaling pattern between the randomness and the prediction accuracy of the model. Data points in the patterns by different sequence lengths, system sizes, and randomness all collapse into a single curve, capturing a model's inherent capability of making correct predictions against increased uncertainty. Given that this curve has such important properties that it can be used to evaluate the model, we define it as the performance characteristic curve of the model. The validity of the curve is tested by three prediction models in the same family, reaching conclusions in line with existing studies. Also, the curve is successfully applied to evaluate two distinct models from the literature. Our work reveals a pattern underlying the data randomness and prediction accuracy. The performance characteristic curve provides a new way to systematically evaluate models' performance, and sheds light on future studies on other frameworks for model evaluation.
翻译:社交网络上的信息扩散预测旨在预测消息的未来接收者,在营销和社交媒体中具有实际应用价值。尽管不同预测模型均宣称性能优异,但通用性能评估框架仍较为有限。本文旨在识别模型的性能特征曲线,该曲线能捕捉模型在不同复杂度任务中的表现。我们提出基于信息熵的指标来量化扩散数据的随机性,继而发现模型随机性与预测精度之间的标度规律。不同序列长度、系统规模和随机性条件下的数据点均坍缩至同一条曲线,揭示了模型在应对不确定性增加时进行正确预测的内在能力。鉴于该曲线具备可用于模型评估的重要特性,我们将其定义为模型的性能特征曲线。我们通过三个同族预测模型验证了曲线的有效性,所得结论与现有研究一致。此外,该曲线被成功应用于评估文献中两种不同类型的模型。本研究揭示了数据随机性与预测精度之间的潜在规律。性能特征曲线为系统评估模型性能提供了新方法,并为未来其他模型评估框架的研究提供了启示。