Text classifiers suffer from small perturbations, that if chosen adversarially, can dramatically change the output of the model. Verification methods can provide robustness certificates against such adversarial perturbations, by computing a sound lower bound on the robust accuracy. Nevertheless, existing verification methods incur in prohibitive costs and cannot practically handle Levenshtein distance constraints. We propose the first method for computing the Lipschitz constant of convolutional classifiers with respect to the Levenshtein distance. We use these Lipschitz constant estimates for training 1-Lipschitz classifiers. This enables computing the certified radius of a classifier in a single forward pass. Our method, LipsLev, is able to obtain $38.80$% and $13.93$% verified accuracy at distance $1$ and $2$ respectively in the AG-News dataset, while being $4$ orders of magnitude faster than existing approaches. We believe our work can open the door to more efficient verification in the text domain.
翻译:文本分类器易受微小扰动影响,若这些扰动经过对抗性选择,可显著改变模型输出。验证方法能够通过计算鲁棒精度的可靠下界,为此类对抗性扰动提供鲁棒性认证。然而,现有验证方法计算成本过高,无法实际处理Levenshtein距离约束。我们提出了首个计算卷积分类器关于Levenshtein距离的Lipschitz常数的方法。利用这些Lipschitz常数估计训练1-Lipschitz分类器,从而通过单次前向传播即可计算分类器的认证半径。我们的方法LipsLev在AG-News数据集中分别实现了距离$1$和$2$下$38.80$%与$13.93$%的验证精度,同时比现有方法快$4$个数量级。我们相信这项工作能为文本领域更高效的验证研究开辟新途径。