The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. In this paper, we present a comprehensive study on the detection of fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec, to extract features from textual data. We evaluate the performance of various classification algorithms, such as logistic regression, support vector machine, random forest, AdaBoost, and LightGBM, on a dataset containing both true and fake news articles. The proposed approach achieves a high level of accuracy and F1-Score, demonstrating its effectiveness in identifying fake news. Additionally, we develop a user-friendly web platform, FAKENEWSBR.COM, to facilitate the verification of news articles' veracity. Our platform provides real-time analysis, allowing users to assess the likelihood of news articles being fake. Through empirical analysis and comparative studies, we demonstrate the potential of our approach to contribute to the fight against the spread of fake news and promote more informed media consumption.
翻译:近期,假新闻的泛滥因其传播虚假信息及操纵公众舆论的潜在危害而成为重大关切。本文针对巴西葡萄牙语中的假新闻检测问题提出了一项综合性研究,重点关注新闻类文本。我们提出了一种基于机器学习的方法,利用自然语言处理技术(包括TF-IDF和Word2Vec)从文本数据中提取特征。在包含真实与虚假新闻文章的数据集上,我们评估了多种分类算法的性能,包括逻辑回归、支持向量机、随机森林、AdaBoost和LightGBM。所提出的方法在准确率和F1分数方面均取得了优异表现,验证了其在识别假新闻方面的有效性。此外,我们开发了用户友好的网络平台FAKENEWSBR.COM,以辅助用户验证新闻文章的真实性。该平台提供实时分析功能,使用户能够评估新闻文章为虚假信息的可能性。通过实证分析与对比研究,我们证明了该方法在遏制假新闻传播、促进更明智的媒体消费方面具有重要潜力。