Deep neural networks can be exploited using natural adversarial samples, which do not impact human perception. Current approaches often rely on deep neural networks' white-box nature to generate these adversarial samples or synthetically alter the distribution of adversarial samples compared to the training distribution. In contrast, we propose EvoSeed, a novel evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples. Our EvoSeed framework uses auxiliary Conditional Diffusion and Classifier models to operate in a black-box setting. We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Classifier Model. Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers. Our research opens new avenues to understanding the limitations of current safety mechanisms and the risk of plausible attacks against classifier systems using image generation. Project Website can be accessed at: https://shashankkotyan.github.io/EvoSeed.
翻译:深度神经网络可能被利用自然对抗样本进行攻击,这些样本不会影响人类感知。当前方法通常依赖深度神经网络的白盒特性来生成这些对抗样本,或者通过合成方式改变对抗样本的分布,使其与训练分布不同。相比之下,我们提出了EvoSeed,一种基于进化策略的新型算法框架,用于生成逼真的自然对抗样本。我们的EvoSeed框架利用辅助的条件扩散模型和分类器模型,在黑盒设置下运行。我们采用CMA-ES来优化初始种子向量的搜索,该向量经过条件扩散模型处理后,会产生被分类器模型误分类的自然对抗样本。实验表明,生成的对抗图像具有较高的图像质量,这引发了关于绕过安全分类器生成有害内容的担忧。我们的研究为了解当前安全机制的局限性以及利用图像生成对分类器系统进行可信攻击的风险开辟了新途径。项目网站可通过以下链接访问:https://shashankkotyan.github.io/EvoSeed。