With the advent of deep learning, progressively larger neural networks have been designed to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of inference by exploiting computation in superposition. To reduce the computational burden per input, we propose Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling many inputs at once. MIMONets augment various deep neural network architectures with variable binding mechanisms to represent an arbitrary number of inputs in a compositional data structure via fixed-width distributed representations. Accordingly, MIMONets adapt nonlinear neural transformations to process the data structure holistically, leading to a speedup nearly proportional to the number of superposed input items in the data structure. After processing in superposition, an unbinding mechanism recovers each transformed input of interest. MIMONets also provide a dynamic trade-off between accuracy and throughput by an instantaneous on-demand switching between a set of accuracy-throughput operating points, yet within a single set of fixed parameters. We apply the concept of MIMONets to both CNN and Transformer architectures resulting in MIMOConv and MIMOFormer, respectively. Empirical evaluations show that MIMOConv achieves about 2-4 x speedup at an accuracy delta within [+0.68, -3.18]% compared to WideResNet CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle 2-4 inputs at once while maintaining a high average accuracy within a [-1.07, -3.43]% delta on the long range arena benchmark. Finally, we provide mathematical bounds on the interference between superposition channels in MIMOFormer. Our code is available at https://github.com/IBM/multiple-input-multiple-output-nets.
翻译:随着深度学习的发展,为解决复杂任务而设计的神经网络规模日益增大。我们利用这些高容量模型,通过叠加计算来降低推理成本。为减轻每个输入的计算负担,我们提出了一种能够同时处理多个输入的多输入多输出神经网络(MIMONets)。MIMONets通过可变绑定机制增强多种深度神经网络架构,利用固定宽度的分布式表示在组合式数据结构中表征任意数量的输入。据此,MIMONets调整非线性神经变换以整体方式处理该数据结构,其加速效果与数据结构中叠加输入项的数量近乎成正比。在叠加处理后,解绑机制可恢复每个经过变换的感兴趣输入。此外,MIMONets通过在一组固定参数内即时按需切换多个精度-吞吐量工作点,实现了精度与吞吐量之间的动态权衡。我们将MIMONets的概念应用于CNN和Transformer架构,分别得到MIMOConv和MIMOFormer。实验评估表明,在CIFAR10和CIFAR100数据集上,相比于WideResNet CNN,MIMOConv实现了约2-4倍加速,精度变化在[+0.68, -3.18]%范围内。类似地,MIMOFormer可同时处理2-4个输入,并在长程竞技场基准测试中保持[-1.07, -3.43]%的高平均精度变化。最后,我们给出了MIMOFormer中叠加通道间干扰的数学界限。我们的代码已开源至https://github.com/IBM/multiple-input-multiple-output-nets。