Their vulnerability to small, imperceptible attacks limits the adoption of deep learning models to real-world systems. Adversarial training has proven to be one of the most promising strategies against these attacks, at the expense of a substantial increase in training time. With the ongoing trend of integrating large-scale synthetic data this is only expected to increase even further. Thus, the need for data-centric approaches that reduce the number of training samples while maintaining accuracy and robustness arises. While data pruning and active learning are prominent research topics in deep learning, they are as of now largely unexplored in the adversarial training literature. We address this gap and propose a new data pruning strategy based on extrapolating data importance scores from a small set of data to a larger set. In an empirical evaluation, we demonstrate that extrapolation-based pruning can efficiently reduce dataset size while maintaining robustness.
翻译:深度神经网络对微小且难以察觉的攻击的脆弱性限制了其在现实世界系统中的广泛应用。对抗训练已被证明是抵御此类攻击最有前景的策略之一,但其代价是训练时间的大幅增加。随着大规模合成数据集成趋势的持续发展,这一问题预计将愈发严重。因此,需要以数据为中心的方法,在保持模型精度与鲁棒性的同时减少训练样本数量。尽管数据剪枝和主动学习是深度学习领域的重要研究方向,但目前在对抗训练文献中尚未得到充分探索。本研究针对这一空白,提出了一种基于数据重要性外推的新型数据剪枝策略:通过从小规模数据集提取的重要性分数外推至大规模数据集。实证评估表明,基于外推的剪枝方法能够有效缩减数据集规模,同时保持模型的鲁棒性。