This paper reveals a key insight that a one-layer decoder-only Transformer is equivalent to a two-layer Recurrent Neural Network (RNN). Building on this insight, we propose ARC-Tran, a novel approach for verifying the robustness of decoder-only Transformers against arbitrary perturbation spaces. Compared to ARC-Tran, current robustness verification techniques are limited either to specific and length-preserving perturbations like word substitutions or to recursive models like LSTMs. ARC-Tran addresses these limitations by meticulously managing position encoding to prevent mismatches and by utilizing our key insight to achieve precise and scalable verification. Our evaluation shows that ARC-Tran (1) trains models more robust to arbitrary perturbation spaces than those produced by existing techniques and (2) shows high certification accuracy of the resulting models.
翻译:本文揭示了一个关键见解:单层仅解码器Transformer等价于双层循环神经网络(RNN)。基于这一见解,我们提出了ARC-Tran——一种用于验证仅解码器Transformer针对任意扰动空间鲁棒性的新方法。相较于ARC-Tran,现有的鲁棒性验证技术要么局限于特定且保持长度不变的扰动(如词语替换),要么仅适用于递归模型(如LSTM)。ARC-Tran通过精细管理位置编码以防止失配,并利用我们的关键见解实现精确且可扩展的验证,从而解决了这些局限性。我们的评估表明,ARC-Tran(1)训练出的模型对任意扰动空间的鲁棒性优于现有技术产生的模型,且(2)所生成模型展现出较高的认证准确率。