Bayesian Neural Networks (BNNs) provide superior estimates of uncertainty by generating an ensemble of predictive distributions. However, inference via ensembling is resource-intensive, requiring additional entropy sources to generate stochasticity which increases resource consumption. We introduce Bayes2IMC, an in-memory computing (IMC) architecture designed for binary Bayesian neural networks that leverage nanoscale device stochasticity to generate desired distributions. Our novel approach utilizes Phase-Change Memory (PCM) to harness inherent noise characteristics, enabling the creation of a binary neural network. This design eliminates the necessity for a pre-neuron Analog-to-Digital Converter (ADC), significantly improving power and area efficiency. We also develop a hardware-software co-optimized correction method applied solely on the logits in the final layer to reduce device-induced accuracy variations across deployments on hardware. Additionally, we devise a simple compensation technique that ensures no drop in classification accuracy despite conductance drift of PCM. We validate the effectiveness of our approach on the CIFAR-10 dataset with a VGGBinaryConnect model, achieving accuracy metrics comparable to ideal software implementations as well as results reported in the literature using other technologies. Finally, we present a complete core architecture and compare its projected power, performance, and area efficiency against an equivalent SRAM baseline, showing a $3.8$ to $9.6 \times$ improvement in total efficiency (in GOPS/W/mm$^2$) and a $2.2 $ to $5.6 \times$ improvement in power efficiency (in GOPS/W). In addition, the projected hardware performance of Bayes2IMC surpasses that of most of the BNN architectures based on memristive devices reported in the literature, and achieves up to $20\%$ higher power efficiency compared to the state-of-the-art.
翻译:贝叶斯神经网络通过生成一组预测分布来提供更优的不确定性估计。然而,基于集成的推理过程资源消耗巨大,需要额外的熵源来产生随机性,从而进一步增加资源开销。本文提出Bayes2IMC,一种专为二值贝叶斯神经网络设计的存内计算架构,该架构利用纳米级器件的固有随机性来生成所需的分布。我们的创新方法利用相变存储器固有的噪声特性,实现了二值神经网络的构建。该设计消除了神经元前模拟-数字转换器的需求,显著提升了功耗与面积效率。我们还开发了一种软硬件协同优化的校正方法,该方法仅作用于最终层的逻辑值,以减少器件特性在硬件部署过程中引起的精度波动。此外,我们设计了一种简易的补偿技术,确保即使相变存储器发生电导漂移,分类精度也不会下降。我们在CIFAR-10数据集上使用VGGBinaryConnect模型验证了所提方法的有效性,其精度指标与理想的软件实现相当,也与文献中采用其他技术报道的结果相媲美。最后,我们展示了一个完整的核心架构,并将其预估的功耗、性能与面积效率与等效的SRAM基准进行比较,结果显示总体效率(以GOPS/W/mm$^2$计)提升了$3.8$至$9.6$倍,能效(以GOPS/W计)提升了$2.2$至$5.6$倍。此外,Bayes2IMC的预估硬件性能超越了文献中报道的大多数基于忆阻器件的贝叶斯神经网络架构,并且与最先进的技术相比,能效最高可提升$20\%$。