On-Manifold Projected Gradient Descent

This work provides a computable, direct, and mathematically rigorous approximation to the differential geometry of class manifolds for high-dimensional data, along with nonlinear projections from input space onto these class manifolds. The tools are applied to the setting of neural network image classifiers, where we generate novel, on-manifold data samples, and implement a projected gradient descent algorithm for on-manifold adversarial training. The susceptibility of neural networks (NNs) to adversarial attack highlights the brittle nature of NN decision boundaries in input space. Introducing adversarial examples during training has been shown to reduce the susceptibility of NNs to adversarial attack; however, it has also been shown to reduce the accuracy of the classifier if the examples are not valid examples for that class. Realistic "on-manifold" examples have been previously generated from class manifolds in the latent of an autoencoder. Our work explores these phenomena in a geometric and computational setting that is much closer to the raw, high-dimensional input space than can be provided by VAE or other black box dimensionality reductions. We employ conformally invariant diffusion maps (CIDM) to approximate class manifolds in diffusion coordinates, and develop the Nystr\"{o}m projection to project novel points onto class manifolds in this setting. On top of the manifold approximation, we leverage the spectral exterior calculus (SEC) to determine geometric quantities such as tangent vectors of the manifold. We use these tools to obtain adversarial examples that reside on a class manifold, yet fool a classifier. These misclassifications then become explainable in terms of human-understandable manipulations within the data, by expressing the on-manifold adversary in the semantic basis on the manifold.

翻译：本文提供了一种可计算、直接且数学严谨的高维数据类流形微分几何近似方法，以及从输入空间到这些类流形的非线性投影。这些工具被应用于神经网络图像分类器的场景中，我们生成了新颖的流形上数据样本，并实现了用于流形上对抗训练的投影梯度下降算法。神经网络对对抗攻击的敏感性凸显了其在输入空间中决策边界的脆弱性。研究表明，在训练过程中引入对抗样本可降低神经网络对对抗攻击的敏感性；然而，若这些样本并非该类别的有效样本，则可能导致分类器准确率下降。此前已有研究通过自编码器潜在空间中的类流形生成真实的“流形上”样本。本研究从几何和计算角度探索这些现象，其更贴近原始高维输入空间，且远优于VAE或其他黑箱降维方法所能提供的近似。我们采用共形不变扩散图（CIDM）在扩散坐标中近似类流形，并开发了Nyström投影方法，以将新样本投影到该设置下的类流形上。在流形近似的基础上，我们利用谱外部微积分（SEC）确定流形的几何量（如切向量）。借助这些工具，我们能够生成位于类流形上却能欺骗分类器的对抗样本。通过将流形上的对抗样本表示为流形上的语义基，这些误分类即可被解释为数据中人可理解的操控操作。