We propose Dual PatchNorm: two Layer Normalization layers (LayerNorms), before and after the patch embedding layer in Vision Transformers. We demonstrate that Dual PatchNorm outperforms the result of exhaustive search for alternative LayerNorm placement strategies in the Transformer block itself. In our experiments, incorporating this trivial modification, often leads to improved accuracy over well-tuned Vision Transformers and never hurts.
翻译:我们提出双补丁归一化(Dual PatchNorm):在视觉Transformer的补丁嵌入层前后各添加一个层归一化层(LayerNorm)。实验证明,与在Transformer块内部对层归一化放置策略进行穷举搜索相比,双补丁归一化取得了更优的结果。在我们的实验中,引入这一简单的修改往往能提升性能,超越经过精心调优的视觉Transformer,且从未造成精度下降。