Misinformation spreads rapidly on social media, causing serious damage by influencing public opinion, promoting dangerous behavior, or eroding trust in reliable sources. It spreads too fast for traditional fact-checking, stressing the need for predictive methods. We introduce CROWDSHIELD, a crowd intelligence-based method for early misinformation prediction. We hypothesize that the crowd's reactions to misinformation reveal its accuracy. Furthermore, we hinge upon exaggerated assertions/claims and replies with particular positions/stances on the source post within a conversation thread. We employ Q-learning to capture the two dimensions -- stances and claims. We utilize deep Q-learning due to its proficiency in navigating complex decision spaces and effectively learning network properties. Additionally, we use a transformer-based encoder to develop a comprehensive understanding of both content and context. This multifaceted approach helps ensure the model pays attention to user interaction and stays anchored in the communication's content. We propose MIST, a manually annotated misinformation detection Twitter corpus comprising nearly 200 conversation threads with more than 14K replies. In experiments, CROWDSHIELD outperformed ten baseline systems, achieving an improvement of ~4% macro-F1 score. We conduct an ablation study and error analysis to validate our proposed model's performance. The source code and dataset are available at https://github.com/LCS2-IIITD/CrowdShield.git.
翻译:错误信息在社交媒体上迅速传播,通过影响公众舆论、助长危险行为或削弱对可靠来源的信任,造成严重损害。其传播速度之快使得传统事实核查难以应对,这凸显了预测性方法的必要性。我们提出CROWDSHIELD,一种基于群体智能的早期错误信息预测方法。我们假设群体对错误信息的反应能够揭示其准确性。进一步地,我们聚焦于对话线程中源帖文所包含的夸张断言/主张以及具有特定立场/态度的回复。我们采用Q学习来捕捉立场与主张这两个维度。由于深度Q学习擅长处理复杂决策空间并有效学习网络特性,我们采用了该方法。此外,我们使用基于Transformer的编码器来深入理解内容与上下文。这种多层面方法有助于确保模型关注用户交互,并始终扎根于传播内容本身。我们构建了MIST——一个包含近200个对话线程、超过1.4万条回复的人工标注Twitter错误信息检测语料库。实验表明,CROWDSHIELD在十种基线系统中表现最优,宏观F1分数提升约4%。我们通过消融实验和错误分析验证了所提模型的性能。源代码与数据集已公开于https://github.com/LCS2-IIITD/CrowdShield.git。