With news and information being as easy to access as they currently are, it is more important than ever to ensure that people are not mislead by what they read. Recently, the rise of neural fake news (AI-generated fake news) and its demonstrated effectiveness at fooling humans has prompted the development of models to detect it. One such model is the Grover model, which can both detect neural fake news to prevent it, and generate it to demonstrate how a model could be misused to fool human readers. In this work we explore the Grover model's fake news detection capabilities by performing targeted attacks through perturbations on input news articles. Through this we test Grover's resilience to these adversarial attacks and expose some potential vulnerabilities which should be addressed in further iterations to ensure it can detect all types of fake news accurately.
翻译:随着新闻和信息如今变得易于获取,确保人们不被所读内容误导比以往任何时候都更为重要。近期,神经假新闻(人工智能生成的假新闻)的兴起及其在欺骗人类方面表现出的有效性,推动了检测模型的发展。其中一种模型是Grover模型,它既能检测神经假新闻以防止其传播,也能生成假新闻以展示模型可能被滥用来欺骗人类读者。在本研究中,我们通过对输入新闻文章施加扰动进行定向攻击,探索了Grover模型的假新闻检测能力。通过此举,我们测试了Grover对这些对抗攻击的鲁棒性,并揭示了一些潜在漏洞,这些漏洞应在后续迭代中得到解决,以确保其能准确检测所有类型的假新闻。