The knowledge encapsulated in a model is the core factor determining its final performance on downstream tasks. Much research in NLP has focused on efficient methods for storing and adapting different types of knowledge, e.g., in dedicated modularized structures, and on how to effectively combine these, e.g., by learning additional parameters. However, given the many possible options, a thorough understanding of the mechanisms involved in these compositions is missing, and hence it remains unclear which strategies to utilize. To address this research gap, we propose a novel framework for zero-shot module composition, which encompasses existing and some novel variations for selecting, weighting, and combining parameter modules under a single unified notion. Focusing on the scenario of domain knowledge and adapter layers, our framework provides a systematic unification of concepts, allowing us to conduct the first comprehensive benchmarking study of various zero-shot knowledge composition strategies. In particular, we test two module combination methods and five selection and weighting strategies for their effectiveness and efficiency in an extensive experimental setup. Our results highlight the efficacy of ensembling but also hint at the power of simple though often-ignored weighting methods. Further in-depth analyses allow us to understand the role of weighting vs. top-k selection, and show that, to a certain extent, the performance of adapter composition can even be predicted.
翻译:模型中所蕴含的知识是决定其在下游任务上最终性能的核心因素。自然语言处理领域的许多研究聚焦于高效存储和适应不同类型的知识(例如采用专用模块化结构)以及如何有效组合这些知识(例如通过学习额外参数)。然而,鉴于存在众多可选方案,我们对这些组合所涉及机制的理解尚不充分,因此选择何种策略仍不明确。为弥补这一研究空白,我们提出了一种新颖的零样本模块组合框架,该框架将所有现有及部分新型的参数模块选择、加权与组合策略统一于单一概念之下。聚焦领域知识与适配器层的应用场景,本框架实现了相关概念的系统化统一,并首次对多种零样本知识组合策略进行了全面的基准研究。具体而言,我们在大规模实验设置中测试了两种模块组合方法及五种选择与加权策略的有效性和效率。实验结果不仅凸显了集成方法的有效性,也揭示了简单但常被忽视的加权方法的潜力。进一步深入分析使我们能够理解加权与top-k选择的作用机制,并表明在某种程度上,适配器组合的性能甚至是可以预测的。