An important aspect of developing reliable deep learning systems is devising strategies that make these systems robust to adversarial attacks. There is a long line of work that focuses on developing defenses against these attacks, but recently, researchers have began to study ways to reverse engineer the attack process. This allows us to not only defend against several attack models, but also classify the threat model. However, there is still a lack of theoretical guarantees for the reverse engineering process. Current approaches that give any guarantees are based on the assumption that the data lies in a union of linear subspaces, which is not a valid assumption for more complex datasets. In this paper, we build on prior work and propose a novel framework for reverse engineering of deceptions which supposes that the clean data lies in the range of a GAN. To classify the signal and attack, we jointly solve a GAN inversion problem and a block-sparse recovery problem. For the first time in the literature, we provide deterministic linear convergence guarantees for this problem. We also empirically demonstrate the merits of the proposed approach on several nonlinear datasets as compared to state-of-the-art methods.
翻译:开发可靠深度学习系统的一个重要方面是制定使这些系统对对抗攻击具有鲁棒性的策略。已有大量工作专注于开发针对这些攻击的防御方法,但最近研究者开始探索逆向工程攻击过程的方法。这不仅能让我们防御多种攻击模型,还能对威胁模型进行分类。然而,逆向工程过程仍然缺乏理论保证。当前具有保证的方法均基于数据位于线性子空间并集的假设,这一假设对更复杂的数据集并不成立。本文在先前工作基础上,提出了一种新颖的欺骗逆向工程框架,该框架假设干净数据位于生成对抗网络的生成范围内。为对信号和攻击进行分类,我们联合求解了GAN逆向问题与块稀疏恢复问题。首次在文献中为该问题提供了确定性线性收敛保证。同时,通过多个非线性数据集上的实验,相较于现有最先进方法,实证展示了所提方法的优越性。