Information diffusion across various new media platforms gradually influences perceptions, decisions, and social behaviors of individual users. In communication studies, the famous Five W's of Communication model (5W Model) has displayed the process of information diffusion clearly. At present, although plenty of studies and corresponding datasets about information diffusion have emerged, a systematic categorization of tasks and an integration of datasets are still lacking. To address this gap, we survey a systematic taxonomy of information diffusion tasks and datasets based on the "5W Model" framework. We first categorize the information diffusion tasks into ten subtasks with definitions and datasets analysis, from three main tasks of information diffusion prediction, social bot detection, and misinformation detection. We also collect the publicly available dataset repository of information diffusion tasks with the available links and compare them based on six attributes affiliated to users and content: user information, social network, bot label, propagation content, propagation network, and veracity label. In addition, we discuss the limitations and future directions of current datasets and research topics to advance the future development of information diffusion. The dataset repository can be accessed at our website https://github.com/fuxiaG/Information-Diffusion-Datasets.
翻译:各类新媒体平台上的信息传播逐渐影响个体用户的认知、决策和社会行为。在传播学研究中,著名的传播五要素模型(5W模型)清晰地揭示了信息传播的过程。当前,尽管已涌现大量关于信息传播的研究及相应数据集,但仍缺乏对任务的系统分类与数据集的整合。为填补这一空白,本研究基于"5W模型"框架,对信息传播任务及数据集进行了系统化分类研究。我们首先将信息传播任务划分为十个子任务,并从信息传播预测、社交机器人检测和虚假信息检测三大主要任务出发,对各子任务进行定义与数据集分析。同时,我们收集了信息传播任务的公开数据集资源库,提供可访问链接,并基于用户与内容的六类属性(用户信息、社交网络、机器人标签、传播内容、传播网络、真实性标签)进行比较分析。此外,我们探讨了当前数据集与研究主题的局限性及未来发展方向,以推动信息传播领域的未来发展。数据集资源库可通过我们的网站 https://github.com/fuxiaG/Information-Diffusion-Datasets 访问。