With more scientific fields relying on neural networks (NNs) to process data incoming at extreme throughputs and latencies, it is crucial to develop NNs with all their parameters stored on-chip. In many of these applications, there is not enough time to go off-chip and retrieve weights. Even more so, off-chip memory such as DRAM does not have the bandwidth required to process these NNs as fast as the data is being produced (e.g., every 25 ns). As such, these extreme latency and bandwidth requirements have architectural implications for the hardware intended to run these NNs: 1) all NN parameters must fit on-chip, and 2) codesigning custom/reconfigurable logic is often required to meet these latency and bandwidth constraints. In our work, we show that many scientific NN applications must run fully on chip, in the extreme case requiring a custom chip to meet such stringent constraints.
翻译:随着越来越多的科学领域依赖神经网络处理以极高通量和高延迟产生的数据,开发将所有参数存储在芯片上的神经网络变得至关重要。在许多此类应用中,没有足够的时间访问片外并检索权重。更重要的是,诸如DRAM之类的片外存储器不具备处理这些神经网络所需的速度,以跟上数据产生的速率(例如,每25纳秒)。因此,这些极端的延迟和带宽需求对运行这些神经网络的硬件产生了架构影响:1)所有神经网络参数必须适配在芯片上;2)通常需要协同设计定制/可重构逻辑以满足这些延迟和带宽限制。在我们的工作中,我们展示了许多科学神经网络应用必须在芯片上完全运行,在极端情况下需要定制芯片以满足如此苛刻的约束条件。