We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers. We demonstrate that Dual PatchNorm outperforms the result of exhaustive search for alternative LayerNorm placement strategies in the Transformer block itself. In our experiments, incorporating this trivial modification, often leads to improved accuracy over well-tuned Vision Transformers and never hurts.
翻译:我们提出双重块归一化:在视觉Transformer的块嵌入层前后各添加一个层标准化层。实验表明,相比在Transformer块内部穷举搜索层标准化放置策略的结果,双重块归一化表现更优。在我们的实验中,这一简单修改往往能提升经过精细调优的视觉Transformer的准确率,且从未产生负面影响。