Website reliability labels underpin almost all research in misinformation detection. However, misinformation sources often exhibit transient behavior, which makes many such labeled lists obsolete over time. We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains. We demonstrate the success of graph neural networks in detecting news site reliability using these attributed webgraphs, and show that our baseline news site reliability classifier outperforms current SoTA methods on the PoliticalNews dataset, achieving an F1 score of 0.96. Finally, we introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources.
翻译:网站可靠性标签几乎支撑着所有虚假信息检测研究。然而,虚假信息来源常表现出短暂行为,导致许多此类标注列表随时间而过时。我们证明搜索引擎优化属性为预测新闻网站可靠性提供了强信号。我们引入了一个新颖的属性化网页图数据集,其中包含标注的新闻域名及其与外部链接域名和反向链接域名的连接。我们展示了图神经网络在使用这些属性化网页图检测新闻网站可靠性方面的成功,并证明我们的基线新闻网站可靠性分类器在PoliticalNews数据集上超越了当前最先进方法,取得了0.96的F1分数。最后,我们引入并评估了一种基于图的新算法,用于发现先前未知的虚假信息新闻来源。