Adversarial attacks are a potential threat to machine learning models by causing incorrect predictions through imperceptible perturbations to the input data. While these attacks have been extensively studied in unstructured data like images, applying them to tabular data, poses new challenges. These challenges arise from the inherent heterogeneity and complex feature interdependencies in tabular data, which differ from the image data. To account for this distinction, it is necessary to establish tailored imperceptibility criteria specific to tabular data. However, there is currently a lack of standardised metrics for assessing the imperceptibility of adversarial attacks on tabular data. To address this gap, we propose a set of key properties and corresponding metrics designed to comprehensively characterise imperceptible adversarial attacks on tabular data. These are: proximity to the original input, sparsity of altered features, deviation from the original data distribution, sensitivity in perturbing features with narrow distribution, immutability of certain features that should remain unchanged, feasibility of specific feature values that should not go beyond valid practical ranges, and feature interdependencies capturing complex relationships between data attributes. We evaluate the imperceptibility of five adversarial attacks, including both bounded attacks and unbounded attacks, on tabular data using the proposed imperceptibility metrics. The results reveal a trade-off between the imperceptibility and effectiveness of these attacks. The study also identifies limitations in current attack algorithms, offering insights that can guide future research in the area. The findings gained from this empirical analysis provide valuable direction for enhancing the design of adversarial attack algorithms, thereby advancing adversarial machine learning on tabular data.
翻译:对抗攻击通过对输入数据进行难以察觉的扰动,导致机器学习模型做出错误预测,从而构成潜在威胁。尽管此类攻击在图像等非结构化数据中已得到广泛研究,但将其应用于表格数据则带来了新的挑战。这些挑战源于表格数据固有的异质性及复杂的特征间依赖关系,这与图像数据存在显著差异。为应对这一区别,有必要建立专门针对表格数据的定制化不可感知性标准。然而,目前尚缺乏用于评估表格数据对抗攻击不可感知性的标准化指标。为填补这一空白,我们提出了一套关键属性及相应度量指标,旨在全面刻画表格数据上不可感知对抗攻击的特征。这些属性包括:与原始输入的接近度、被修改特征的稀疏性、相对于原始数据分布的偏离度、对分布狭窄特征扰动的敏感性、应保持不变特征的不可变性、不应超出有效实际范围的特定特征值可行性,以及捕捉数据属性间复杂关系的特征相互依赖性。我们使用所提出的不可感知性度量指标,在表格数据上评估了五种对抗攻击(包括有界攻击和无界攻击)的不可感知性。结果表明,这些攻击的不可感知性与有效性之间存在权衡关系。本研究还揭示了当前攻击算法的局限性,为未来该领域的研究提供了指导方向。通过本实证分析获得的研究结果,为改进对抗攻击算法设计提供了有价值的指引,从而推动表格数据对抗机器学习领域的发展。