Though Transformers have achieved promising results in many computer vision tasks, they tend to be over-confident in predictions, as the standard Dot Product Self-Attention (DPSA) can barely preserve distance for the unbounded input domain. In this work, we fill this gap by proposing a novel Lipschitz Regularized Transformer (LRFormer). Specifically, we present a new similarity function with the distance within Banach Space to ensure the Lipschitzness and also regularize the term by a contractive Lipschitz Bound. The proposed method is analyzed with a theoretical guarantee, providing a rigorous basis for its effectiveness and reliability. Extensive experiments conducted on standard vision benchmarks demonstrate that our method outperforms the state-of-the-art single forward pass approaches in prediction, calibration, and uncertainty estimation.
翻译:尽管Transformer在许多计算机视觉任务中取得了令人瞩目的成果,但由于标准点积自注意力(DPSA)在无界输入域上难以保持距离,其预测往往表现出过度自信。为填补这一空白,本文提出了一种新型Lipschitz正则化Transformer(LRFormer)。具体而言,我们引入了一种利用巴拿赫空间内距离的相似度函数以确保Lipschitz性质,并通过收缩性Lipschitz界对该项进行正则化。所提方法具有理论保证,为其有效性和可靠性提供了严格依据。在标准视觉基准上的大量实验表明,本方法在预测准确性、校准质量及不确定性估计方面均优于现有最先进的单次前向传播方法。