Website reliability labels underpin almost all research in misinformation detection. However, misinformation sources often exhibit transient behavior, which makes many such labeled lists obsolete over time. We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains. We demonstrate the success of graph neural networks in detecting news site reliability using these attributed webgraphs, and show that our baseline news site reliability classifier outperforms current SoTA methods on the PoliticalNews dataset, achieving an F1 score of 0.96. Finally, we introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources.
翻译:网站可靠性标签几乎支撑了所有关于虚假信息检测的研究。然而,虚假信息来源通常表现出瞬态行为,这使得许多此类标签列表随时间推移而失效。我们证明,搜索引擎优化(SEO)属性为预测新闻网站可靠性提供了强有力的信号。我们引入了一个新颖的属性化网页图数据集,其中包含带有标签的新闻域名及其与外部链接域名和反向链接域名的连接。我们展示了图神经网络在使用这些属性化网页图检测新闻网站可靠性方面的成功,并证明我们的基线新闻网站可靠性分类器在PoliticalNews数据集上优于当前最先进的方法,实现了0.96的F1分数。最后,我们引入并评估了一种新颖的基于图的算法,用于发现先前未知的虚假信息来源。