In the era of widespread social networks, the rapid dissemination of fake news has emerged as a significant threat, inflicting detrimental consequences across various dimensions of people's lives. Machine learning and deep learning approaches have been extensively employed for identifying fake news. However, a significant challenge in identifying fake news is the limited availability of labeled news datasets. Therefore, the One-Class Learning (OCL) approach, utilizing only a small set of labeled data from the interest class, can be a suitable approach to address this challenge. On the other hand, representing data as a graph enables access to diverse content and structural information, and label propagation methods on graphs can be effective in predicting node labels. In this paper, we adopt a graph-based model for data representation and introduce a semi-supervised and one-class approach for fake news detection, called LOSS-GAT. Initially, we employ a two-step label propagation algorithm, utilizing Graph Neural Networks (GNNs) as an initial classifier to categorize news into two groups: interest (fake) and non-interest (real). Subsequently, we enhance the graph structure using structural augmentation techniques. Ultimately, we predict the final labels for all unlabeled data using a GNN that induces randomness within the local neighborhood of nodes through the aggregation function. We evaluate our proposed method on five common datasets and compare the results against a set of baseline models, including both OCL and binary labeled models. The results demonstrate that LOSS-GAT achieves a notable improvement, surpassing 10%, with the advantage of utilizing only a limited set of labeled fake news. Noteworthy, LOSS-GAT even outperforms binary labeled models.
翻译:在社交网络广泛普及的时代,虚假新闻的快速传播已成为重大威胁,对人们生活的各个维度造成损害性影响。机器学习与深度学习方法已被广泛用于虚假新闻识别,然而,标记新闻数据集的有限可用性成为该领域的一个关键挑战。因此,仅利用少量兴趣类标记数据的单类学习(OCL)方法成为应对这一挑战的可行方案。另一方面,将数据表示为图结构有助于获取多样化的内容与结构信息,而图上的标签传播方法在预测节点标签方面具有有效性。本文采用基于图的模型进行数据表示,并提出一种名为LOSS-GAT的半监督单类虚假新闻检测方法。首先,我们采用两步标签传播算法,利用图神经网络(GNN)作为初始分类器将新闻划分为兴趣类(虚假)与非兴趣类(真实)两组。随后,通过结构增强技术优化图结构。最终,我们利用具有聚合函数诱导节点局部邻域随机性的GNN,对所有未标记数据预测最终标签。我们在五个常用数据集上评估所提方法,并与包含OCL及二分类标记模型在内的基准模型进行对比。结果表明,LOSS-GAT在仅利用有限标记虚假新闻的优势下,实现了超过10%的显著性能提升。值得注意的是,LOSS-GAT甚至优于二分类标记模型。