This study explores the generation and evaluation of synthetic fake news through fact based manipulations using large language models (LLMs). We introduce a novel methodology that extracts key facts from real articles, modifies them, and regenerates content to simulate fake news while maintaining coherence. To assess the quality of the generated content, we propose a set of evaluation metrics coherence, dissimilarity, and correctness. The research also investigates the application of synthetic data in fake news classification, comparing traditional machine learning models with transformer based models such as BERT. Our experiments demonstrate that transformer models, especially BERT, effectively leverage synthetic data for fake news detection, showing improvements with smaller proportions of synthetic data. Additionally, we find that fact verification features, which focus on identifying factual inconsistencies, provide the most promising results in distinguishing synthetic fake news. The study highlights the potential of synthetic data to enhance fake news detection systems, offering valuable insights for future research and suggesting that targeted improvements in synthetic data generation can further strengthen detection models.
翻译:本研究探讨了利用大语言模型通过事实操纵生成和评估合成虚假新闻的方法。我们提出了一种新颖的方法论:从真实文章中提取关键事实,对其进行修改,并重新生成内容以模拟虚假新闻,同时保持文本连贯性。为评估生成内容的质量,我们提出了一套评估指标——连贯性、差异性和正确性。本研究还调查了合成数据在虚假新闻分类中的应用,比较了传统机器学习模型与基于Transformer的模型(如BERT)。实验表明,Transformer模型(特别是BERT)能有效利用合成数据进行虚假新闻检测,且在合成数据比例较小时仍能表现出性能提升。此外,我们发现专注于识别事实不一致性的事实核查特征,在区分合成虚假新闻方面展现出最具前景的结果。本研究强调了合成数据在增强虚假新闻检测系统方面的潜力,为未来研究提供了有价值的见解,并表明针对合成数据生成进行定向改进可进一步强化检测模型。