This article summarizes principles and ideas from the emerging area of applying \textit{conditional computation} methods to the design of neural networks. In particular, we focus on neural networks that can dynamically activate or de-activate parts of their computational graph conditionally on their input. Examples include the dynamic selection of, e.g., input tokens, layers (or sets of layers), and sub-modules inside each layer (e.g., channels in a convolutional filter). We first provide a general formalism to describe these techniques in an uniform way. Then, we introduce three notable implementations of these principles: mixture-of-experts (MoEs) networks, token selection mechanisms, and early-exit neural networks. The paper aims to provide a tutorial-like introduction to this growing field. To this end, we analyze the benefits of these modular designs in terms of efficiency, explainability, and transfer learning, with a focus on emerging applicative areas ranging from automated scientific discovery to semantic communication.
翻译:本文总结了将条件计算方法应用于神经网络设计这一新兴领域的原理与思想。我们特别关注能够根据输入条件动态激活或停用其计算图部分的神经网络。具体实例包括动态选择输入标记、网络层(或层组)以及每层内部的子模块(例如卷积滤波器中的通道)。我们首先提供了一个统一的形式化框架来描述这些技术。随后,我们介绍了这些原理的三个典型实现:专家混合网络、标记选择机制以及早期退出神经网络。本文旨在以教程式导论的形式介绍这一快速发展的领域。为此,我们分析了此类模块化设计在效率、可解释性和迁移学习方面的优势,并重点关注从自动化科学发现到语义通信等新兴应用领域。