Design and Implementation of Hardware Accelerators for Neural Processing Applications

Primary motivation for this work was the need to implement hardware accelerators for a newly proposed ANN structure called Auto Resonance Network (ARN) for robotic motion planning. ARN is an approximating feed-forward hierarchical and explainable network. It can be used in various AI applications but the application base was small. Therefore, the objective of the research was twofold: to develop a new application using ARN and to implement a hardware accelerator for ARN. As per the suggestions given by the Doctoral Committee, an image recognition system using ARN has been implemented. An accuracy of around 94% was achieved with only 2 layers of ARN. The network also required a small training data set of about 500 images. Publicly available MNIST dataset was used for this experiment. All the coding was done in Python. Massive parallelism seen in ANNs presents several challenges to CPU design. For a given functionality, e.g., multiplication, several copies of serial modules can be realized within the same area as a parallel module. Advantage of using serial modules compared to parallel modules under area constraints has been discussed. One of the module often useful in ANNs is a multi-operand addition. One problem in its implementation is that the estimation of carry bits when the number of operands changes. A theorem to calculate exact number of carry bits required for a multi-operand addition has been presented in the thesis which alleviates this problem. The main advantage of the modular approach to multi-operand addition is the possibility of pipelined addition with low reconfiguration overhead. This results in overall increase in throughput for large number of additions, typically seen in several DNN configurations.

翻译：本研究的主要动机是为一种新提出的、用于机器人运动规划的、名为自谐振网络（ARN）的类前馈分层可解释人工神经网络结构实现硬件加速器。ARN可应用于多种人工智能场景，但其应用基础尚不广泛。因此，本研究的双重目标是：开发ARN的新应用场景，并为其实现硬件加速器。根据博士委员会的提议，我们基于ARN实现了一套图像识别系统。仅使用两层ARN即达到约94%的识别准确率，且该网络仅需约500张图像的小规模训练数据集。实验采用公开的MNIST数据集，所有代码均使用Python实现。人工神经网络中广泛存在的并行机制对CPU设计提出了多重挑战。对于乘法等特定功能，在相同芯片面积内可集成多组串行模块。本文探讨了在面积约束下串行模块相较于并行模块的优势。人工神经网络中常用的运算模块之一是多元加法器。其实现核心难点在于：当操作数数量变化时，进位位数的估算问题。本文提出了一种定理，可精确计算多元加法所需进位位数，有效解决了该问题。基于模块化设计的多元加法器主要优势在于可实现低重构开销的流水线加法，从而显著提升在处理大量加法运算（常见于多种深度神经网络配置）时的总体吞吐量。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日