The knowledge encapsulated in a model is the core factor determining its final performance on downstream tasks. Much research in NLP has focused on efficient methods for storing and adapting different types of knowledge, e.g., in dedicated modularized structures, and on how to effectively combine these, e.g., by learning additional parameters. However, given the many possible options, a thorough understanding of the mechanisms involved in these compositions is missing, and hence it remains unclear which strategies to utilize. To address this research gap, we propose a novel framework for zero-shot module composition, which encompasses existing and some novel variations for selecting, weighting, and combining parameter modules under a single unified notion. Focusing on the scenario of domain knowledge and adapter layers, our framework provides a systematic unification of concepts, allowing us to conduct the first comprehensive benchmarking study of various zero-shot knowledge composition strategies. In particular, we test two module combination methods and five selection and weighting strategies for their effectiveness and efficiency in an extensive experimental setup. Our results highlight the efficacy of ensembling but also hint at the power of simple though often-ignored weighting methods. Further in-depth analyses allow us to understand the role of weighting vs. top-k selection, and show that, to a certain extent, the performance of adapter composition can even be predicted.
翻译:模型中所蕴含的知识是决定其在下游任务上最终性能的核心因素。自然语言处理领域的许多研究关注于高效存储和适配不同类型知识的方法(例如在专门的模块化结构中),以及如何有效组合这些知识(例如通过学习额外参数)。然而,面对众多可能的选择,我们对这些组合所涉及的机制仍缺乏深入理解,因此尚不清楚应采用何种策略。为填补这一研究空白,我们提出了一种新颖的零样本模块组合框架,该框架将现有及若干新颖的用于选择、加权和组合参数模块的策略统一纳入单一概念之下。聚焦于领域知识和适配器层场景,我们的框架提供了对这些概念的系统性统一,从而首次对多种零样本知识组合策略进行了全面的基准测试研究。具体而言,我们在广泛的实验设置中测试了两种模块组合方法及五种选择与加权策略的有效性和效率。研究结果突显了集成方法的有效性,同时也揭示了简单但常被忽视的加权方法所蕴含的强大潜力。进一步的深入分析使我们能够理解加权与top-k选择各自的作用,并表明在某种程度上,适配器组合的性能甚至是可以预测的。