With news and information being as easy to access as they currently are, it is more important than ever to ensure that people are not mislead by what they read. Recently, the rise of neural fake news (AI-generated fake news) and its demonstrated effectiveness at fooling humans has prompted the development of models to detect it. One such model is the Grover model, which can both detect neural fake news to prevent it, and generate it to demonstrate how a model could be misused to fool human readers. In this work we explore the Grover model's fake news detection capabilities by performing targeted attacks through perturbations on input news articles. Through this we test Grover's resilience to these adversarial attacks and expose some potential vulnerabilities which should be addressed in further iterations to ensure it can detect all types of fake news accurately.
翻译:在当前新闻与信息触手可及的环境下,确保公众不被所读内容误导显得尤为重要。近年来,神经假新闻(AI生成的虚假新闻)的兴起及其在欺骗人类方面表现出的有效性,推动了检测模型的发展。Grover模型便是其中之一,它既能检测神经假新闻以阻止其传播,也能生成此类新闻以展示模型如何被滥用于欺骗人类读者。本研究通过对输入新闻文本施加定向扰动攻击,探究Grover模型的假新闻检测能力。通过这一方法,我们测试了Grover对此类对抗性攻击的鲁棒性,揭示了若干潜在漏洞;这些漏洞应在后续迭代中予以解决,以确保模型能够准确检测所有类型的虚假新闻。