The growing reliance on artificial intelligence in safety- and security-critical applications demands effective neural network certification. A challenging real-world use case is "patch attacks", where adversarial patches or lighting conditions obscure parts of images, for example, traffic signs. A significant step towards certification against patch attacks was recently achieved using PREMAP, which uses under- and over-approximations of the preimage, the set of inputs that lead to a specified output, for the certification. While the PREMAP approach is versatile, it is currently limited to fully-connected neural networks of moderate dimensionality. In order to tackle broader real-world use cases, we present novel algorithmic extensions to PREMAP involving tighter bounds, adaptive Monte Carlo sampling, and improved branching heuristics. Firstly, we demonstrate that these efficiency improvements significantly outperform the original PREMAP and enable scaling to convolutional neural networks that were previously intractable. Secondly, we showcase the potential of preimage approximation methodology for analysing and certifying reliability and robustness on a range of use cases from computer vision and control.
翻译:随着人工智能在安全和安防关键应用中的日益依赖,有效的神经网络认证变得至关重要。一个具有挑战性的现实世界用例是“补丁攻击”,其中对抗性补丁或光照条件会遮挡图像的部分区域,例如交通标志。最近,使用PREMAP在针对补丁攻击的认证方面取得了重要进展,该方法利用前像(即导致特定输出的输入集合)的欠近似和过近似进行认证。尽管PREMAP方法具有通用性,但目前仅限于中等维度的全连接神经网络。为了应对更广泛的现实世界用例,我们提出了涉及更紧界、自适应蒙特卡洛采样和改进分支启发式算法的PREMAP新算法扩展。首先,我们证明这些效率改进显著优于原始PREMAP,并能够扩展到先前难以处理的卷积神经网络。其次,我们展示了前像近似方法在计算机视觉和控制领域的一系列用例中分析和认证可靠性与鲁棒性的潜力。