Machine learning models can learn from data samples to carry out various tasks efficiently. When data samples are adversarially manipulated, such as by insertion of carefully crafted noise, it can cause the model to make mistakes. Quantum machine learning models are also vulnerable to such adversarial attacks, especially in image classification using variational quantum classifiers. While there are promising defenses against these adversarial perturbations, such as training with adversarial samples, they face practical limitations. For example, they are not applicable in scenarios where training with adversarial samples is either not possible or can overfit the models on one type of attack. In this paper, we propose an adversarial training-free defense framework that utilizes a quantum autoencoder to purify the adversarial samples through reconstruction. Moreover, our defense framework provides a confidence metric to identify potentially adversarial samples that cannot be purified the quantum autoencoder. Extensive evaluation demonstrates that our defense framework can significantly outperform state-of-the-art in prediction accuracy (up to 68%) under adversarial attacks.
翻译:机器学习模型能够从数据样本中学习以高效执行各种任务。当数据样本受到对抗性操纵(例如插入精心设计的噪声)时,可能导致模型出错。量子机器学习模型同样易受此类对抗攻击的影响,尤其在基于变分量子分类器的图像分类任务中。尽管存在一些有前景的对抗扰动防御方法(例如使用对抗样本进行训练),但它们面临实际局限性。例如,在无法使用对抗样本进行训练或该方法会导致模型对某类攻击过拟合的场景中,这些防御手段并不适用。本文提出一种无需对抗训练的防御框架,该框架利用量子自编码器通过重建过程净化对抗样本。此外,该防御框架还提供一种置信度指标,用于识别量子自编码器无法净化的潜在对抗样本。大量实验表明,在对抗攻击下,所提防御框架的预测准确率相比现有最优方法可显著提升(最高达68%)。