We analyze the operation of transformer language adapters, which are small modules trained on top of a frozen language model to adapt its predictions to new target languages. We show that adapted predictions mostly evolve in the source language the model was trained on, while the target language becomes pronounced only in the very last layers of the model. Moreover, the adaptation process is gradual and distributed across layers, where it is possible to skip small groups of adapters without decreasing adaptation performance. Last, we show that adapters operate on top of the model's frozen representation space while largely preserving its structure, rather than on an 'isolated' subspace. Our findings provide a deeper view into the adaptation process of language models to new languages, showcasing the constraints imposed on it by the underlying model and introduces practical implications to enhance its efficiency.
翻译:我们分析了Transformer语言适配器的运作机制,这些适配器是训练于冻结语言模型之上的小型模块,用于将其预测适配至新的目标语言。研究表明,适配后的预测主要沿模型训练所用源语言演化,而目标语言仅在模型的最后几层才显著显现。此外,适配过程是渐进式的且跨层分布,可以跳过小部分适配器组而不降低适配性能。最后,我们揭示了适配器在模型冻结的表示空间之上运作,并很大程度上保持其结构,而非作用于“隔离”子空间。我们的发现为语言模型对新语言的适配过程提供了更深层次的视角,展示了底层模型对适配过程施加的约束,并提出了提升其效率的实践启示。