Nowadays, Information spreads at an unprecedented pace in social media and discerning truth from misinformation and fake news has become an acute societal challenge. Machine learning (ML) models have been employed to identify fake news but are far from perfect with challenging problems like limited accuracy, interpretability, and generalizability. In this paper, we enhance ML-based solutions with linguistics input and we propose LingML, linguistic-informed ML, for fake news detection. We conducted an experimental study with a popular dataset on fake news during the pandemic. The experiment results show that our proposed solution is highly effective. There are fewer than two errors out of every ten attempts with only linguistic input used in ML and the knowledge is highly explainable. When linguistics input is integrated with advanced large-scale ML models for natural language processing, our solution outperforms existing ones with 1.8% average error rate. LingML creates a new path with linguistics to push the frontier of effective and efficient fake news detection. It also sheds light on real-world multi-disciplinary applications requiring both ML and domain expertise to achieve optimal performance.
翻译:如今,信息在社交媒体中以空前速度传播,辨别真相与虚假信息及假新闻已成为紧迫的社会挑战。机器学习模型虽已用于虚假新闻检测,但存在准确性、可解释性和泛化能力有限等难题。本文通过引入语言学输入增强基于机器学习的解决方案,提出LingML(语言学驱动机器学习)用于虚假新闻检测。我们使用疫情期间的公开数据集开展实验研究,结果表明所提方案具有高效性:仅使用语言学输入的机器学习每10次检测中错误不足2次,且知识高度可解释。当语言学输入与面向自然语言处理的大规模先进机器学习模型结合时,本方案以1.8%的平均错误率超越现有方法。LingML开创了利用语言学推动高效虚假新闻检测的新路径,也为需要融合机器学习与领域专业知识以实现最优性能的现实多学科应用提供了启示。