Due to their architecture and how they are trained, artificial neural networks are typically not robust toward pruning, replacing, or shuffling layers at test time. However, such properties would be desirable for different applications, such as distributed neural network architectures where the order of execution cannot be guaranteed or parts of the network can fail during inference. In this work, we address these issues through a number of proposed training approaches for vision transformers whose most important component is randomizing the execution order of attention modules at training time. We show that with our proposed approaches, vision transformers are indeed capable to adapt to arbitrary layer execution orders at test time assuming one tolerates a reduction (about 20\%) in accuracy at the same model size. We also find that our trained models can be randomly merged with each other resulting in functional ("Frankenstein") models without loss of performance compared to the source models. Finally, we layer-prune our models at test time and find that their performance declines gracefully.
翻译:由于人工神经网络的结构和训练方式,它们通常对测试时的层剪枝、替换或重排不具备鲁棒性。然而,此类特性在不同应用场景中具有重要价值,例如在分布式神经网络架构中,执行顺序无法保证或网络部分组件可能在推理过程中失效。本研究通过提出一系列针对视觉Transformer的训练方法来解决这些问题,其中最关键的是在训练过程中随机化注意力模块的执行顺序。我们证明,采用所提出的方法后,视觉Transformer确实能够在测试时适应任意的层执行顺序,前提是接受相同模型规模下精度下降(约20%)的代价。我们还发现,训练后的模型可以相互随机合并,形成功能正常的“弗兰肯斯坦”模型,且性能与源模型相比无损失。最后,我们在测试时对模型进行层剪枝,发现其性能呈现平缓下降趋势。