Most of the approaches proposed so far to craft targeted adversarial examples against Deep Learning classifiers are highly suboptimal and typically rely on increasing the likelihood of the target class, thus implicitly focusing on one-hot encoding settings. In this paper, we propose a more general, theoretically sound, targeted attack that resorts to the minimization of a Jacobian-induced MAhalanobis distance (JMA) term, taking into account the effort (in the input space) required to move the latent space representation of the input sample in a given direction. The minimization is solved by exploiting the Wolfe duality theorem, reducing the problem to the solution of a Non-Negative Least Square (NNLS) problem. The proposed algorithm provides an optimal solution to a linearized version of the adversarial example problem originally introduced by Szegedy et al. \cite{szegedy2013intriguing}. The experiments we carried out confirm the generality of the proposed attack which is proven to be effective under a wide variety of output encoding schemes. Noticeably, the JMA attack is also effective in a multi-label classification scenario, being capable to induce a targeted modification of up to half the labels in a complex multilabel classification scenario with 20 labels, a capability that is out of reach of all the attacks proposed so far. As a further advantage, the JMA attack usually requires very few iterations, thus resulting more efficient than existing methods.
翻译:迄今为止,针对深度学习分类器的定向对抗样本生成方法大多高度次优,且通常依赖于增大目标类别的似然度,从而隐含地聚焦于独热编码设置。本文提出一种更具通用性且理论严密的定向攻击方法,该方法通过最小化雅可比诱导马氏距离(JMA)项,并考虑在输入空间中沿特定方向移动样本潜层表征所需的代价。该最小化问题通过Wolfe对偶定理求解,最终转化为非负最小二乘(NNLS)问题。所提算法为Szegedy等人(2013)最初提出的对抗样本问题的线性化版本提供了最优解。实验验证了所提攻击的通用性,证明其在多种输出编码方案下均有效。值得注意的是,JMA攻击在 multilabel 分类场景中同样表现优异,能够在一个包含20个标签的复杂多标签分类任务中定向修改高达半数的标签,这一能力远超现有所有攻击方法。此外,JMA攻击通常仅需极少迭代次数,因而比现有方法具有更高的效率。