Detecting misinformation threads is crucial to guarantee a healthy environment on social media. We address the problem using the data set created during the COVID-19 pandemic. It contains cascades of tweets discussing information weakly labeled as reliable or unreliable, based on a previous evaluation of the information source. The models identifying unreliable threads usually rely on textual features. But reliability is not just what is said, but by whom and to whom. We additionally leverage on network information. Following the homophily principle, we hypothesize that users who interact are generally interested in similar topics and spreading similar kind of news, which in turn is generally reliable or not. We test several methods to learn representations of the social interactions within the cascades, combining them with deep neural language models in a Multi-Input (MI) framework. Keeping track of the sequence of the interactions during the time, we improve over previous state-of-the-art models.
翻译:检测虚假信息传播链对于维护社交媒体的健康环境至关重要。我们利用COVID-19疫情期间创建的数据集来解决这一问题。该数据集包含讨论信息的推文传播链,这些信息基于信息源的先前评估而被弱标记为可靠或不可靠。识别不可靠信息传播链的模型通常依赖文本特征。然而,可靠性不仅取决于信息内容,还取决于信息的传播者和接收者。因此,我们进一步利用网络信息。依据同质性原则,我们假设相互互动的用户通常关注相似主题并传播相似类型的新闻,而这些新闻通常具有一致的可信度。我们测试了多种方法来学习传播链中社交互动的表示,并将它们与深度神经语言模型结合,构建多输入(Multi-Input, MI)框架。通过跟踪互动的时间序列,我们改进了先前的最先进模型。