Website reliability labels underpin almost all research in misinformation detection. However, misinformation sources often exhibit transient behavior, which makes many such labeled lists obsolete over time. We demonstrate that Search Engine Optimization (SEO) attributes provide strong signals for predicting news site reliability. We introduce a novel attributed webgraph dataset with labeled news domains and their connections to outlinking and backlinking domains. We demonstrate the success of graph neural networks in detecting news site reliability using these attributed webgraphs, and show that our baseline news site reliability classifier outperforms current SoTA methods on the PoliticalNews dataset, achieving an F1 score of 0.96. Finally, we introduce and evaluate a novel graph-based algorithm for discovering previously unknown misinformation news sources.
翻译:网站可靠性标签支撑着几乎所有虚假信息检测研究。然而,虚假信息源往往表现出短暂性行为,导致许多此类带标签的列表随时间推移而过时。我们证明搜索引擎优化(SEO)属性为预测新闻网站可靠性提供了强信号。我们引入了一个新颖的属性化网络图数据集,包含带标签的新闻域名及其与外链域名和反向链接域名的连接关系。我们证明了图神经网络在使用这些属性化网络图检测新闻网站可靠性方面的成功,并展示我们的基线新闻网站可靠性分类器在PoliticalNews数据集上优于当前最先进方法,取得了0.96的F1分数。最后,我们提出并评估了一种基于图的新算法,用于发现先前未知的虚假信息新闻源。