Neuroevolution automates the complex task of neural network design but often ignores the inherent adversarial fragility of evolved models which is a barrier to adoption in safety-critical scenarios. While robust training methods have received significant attention, the design of architectures exhibiting intrinsic robustness remains largely unexplored. In this paper, we propose NERO-Net, a neuroevolutionary approach to design convolutional neural networks better equipped to resist adversarial attacks. Our search strategy isolates architectural influence on robustness by avoiding adversarial training during the evolutionary loop. As such, our fitness function promotes candidates that, even trained with standard (non-robust) methods, achieve high post-attack accuracy without sacrificing the accuracy on clean samples. We assess NERO-Net on CIFAR-10 with a specific focus on $L_\infty$-robustness. In particular, the fittest individual emerged from evolutionary search with 33% accuracy against FGSM, used as an efficient estimator for robustness during the search phase, while maintaining 87% clean accuracy. Further standard training of this individual boosted these metrics to 47% adversarial and 93% clean accuracy, suggesting inherent architectural robustness. Adversarial training brings the overall accuracy of the model up to 40% against AutoAttack.
翻译:神经进化自动化了神经网络设计的复杂任务,但往往忽略了进化模型固有的对抗脆弱性,这成为其在安全关键场景中应用的障碍。尽管鲁棒训练方法已受到广泛关注,但具有内在鲁棒性的架构设计仍鲜有探索。本文提出NERO-Net,一种神经进化方法,旨在设计更能抵抗对抗攻击的卷积神经网络。我们的搜索策略通过在进化循环中避免对抗训练,隔离了架构对鲁棒性的影响。因此,适应度函数促使候选网络即便采用标准(非鲁棒)训练方法,也能在保持干净样本高准确率的同时实现高对抗攻击后准确率。我们在CIFAR-10数据集上评估NERO-Net,特别关注$L_\infty$鲁棒性。具体而言,进化搜索中最优个体在搜索阶段以FGSM作为鲁棒性高效评估指标时,实现了33%的对抗准确率,同时保持87%的干净准确率。对该个体进一步进行标准训练后,其对抗准确率与干净准确率分别提升至47%与93%,表明其具有内在的架构鲁棒性。而对抗训练则可使模型针对AutoAttack的整体准确率达40%。