Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop models that specialise towards multiple tasks without incurring negative interference and that generalise systematically to non-identically distributed tasks. Modular deep learning has emerged as a promising solution to these challenges. In this framework, units of computation are often implemented as autonomous parameter-efficient modules. Information is conditionally routed to a subset of modules and subsequently aggregated. These properties enable positive transfer and systematic generalisation by separating computation from routing and updating modules locally. We offer a survey of modular architectures, providing a unified view over several threads of research that evolved independently in the scientific literature. Moreover, we explore various additional purposes of modularity, including scaling language models, causal inference, programme induction, and planning in reinforcement learning. Finally, we report various concrete applications where modularity has been successfully deployed such as cross-lingual and cross-modal knowledge transfer. Related talks and projects to this survey, are available at https://www.modulardeeplearning.com/.
翻译:迁移学习近期已成为机器学习的主导范式。针对下游任务进行微调的预训练模型能够以更少的标注样本实现更优性能。然而,如何开发既能针对多个任务实现专业化而不产生负向干扰,又能对非独立同分布任务进行系统性泛化的模型,仍不明确。模块化深度学习已成为应对这些挑战的有前景的方案。在该框架中,计算单元通常被实现为自主的参数高效模块。信息被有条件地路由至模块子集,随后进行聚合。这些特性通过将计算与路由分离、对模块进行局部更新,实现了正向迁移与系统性泛化。本文对模块化架构进行了综述,为科学文献中独立演进的多个研究方向提供了统一视角。此外,我们探讨了模块化的多种附加应用场景,包括语言模型扩展、因果推断、程序归纳以及强化学习中的规划。最后,我们报告了模块化已成功部署的具体应用案例,如跨语言与跨模态知识迁移。与本综述相关的讲座及项目可访问 https://www.modulardeeplearning.com/。