Machine Learning (ML) has recently been a skyrocketing field in Computer Science. As computer hardware engineers, we are enthusiastic about hardware implementations of popular software ML architectures to optimize their performance, reliability, and resource usage. In this project, we designed a highly-configurable, real-time device for recognizing handwritten letters and digits using an Altera DE1 FPGA Kit. We followed various engineering standards, including IEEE-754 32-bit Floating-Point Standard, Video Graphics Array (VGA) display protocol, Universal Asynchronous Receiver-Transmitter (UART) protocol, and Inter-Integrated Circuit (I2C) protocols to achieve the project goals. These significantly improved our design in compatibility, reusability, and simplicity in verifications. Following these standards, we designed a 32-bit floating-point (FP) instruction set architecture (ISA). We developed a 5-stage RISC processor in System Verilog to manage image processing, matrix multiplications, ML classifications, and user interfaces. Three different ML architectures were implemented and evaluated on our design: Linear Classification (LC), a 784-64-10 fully connected neural network (NN), and a LeNet-like Convolutional Neural Network (CNN) with ReLU activation layers and 36 classes (10 for the digits and 26 for the case-insensitive letters). The training processes were done in Python scripts, and the resulting kernels and weights were stored in hex files and loaded into the FPGA's SRAM units. Convolution, pooling, data management, and various other ML features were guided by firmware in our custom assembly language. This paper documents the high-level design block diagrams, interfaces between each System Verilog module, implementation details of our software and firmware components, and further discussions on potential impacts.
翻译:机器学习(ML)近年来已成为计算机科学中迅猛发展的领域。作为计算机硬件工程师,我们致力于通过硬件实现流行的软件机器学习架构,以优化其性能、可靠性和资源利用率。本项目利用Altera DE1 FPGA套件设计了一款高度可配置的实时手写字母与数字识别设备。我们遵循多项工程标准,包括IEEE-754 32位浮点标准、视频图形阵列(VGA)显示协议、通用异步收发器(UART)协议以及集成电路互连(I2C)协议以实现项目目标。这些标准显著提升了设计在兼容性、可重用性和验证简便性方面的表现。基于上述标准,我们设计了32位浮点(FP)指令集架构(ISA),并在System Verilog中开发了五级流水线RISC处理器,用于管理图像处理、矩阵乘法、机器学习分类及用户界面。我们在设计中实现并评估了三种不同的机器学习架构:线性分类(LC)、784-64-10全连接神经网络(NN),以及包含ReLU激活层和36类分类(数字10类+不区分大小写字母26类)的类LeNet卷积神经网络(CNN)。训练过程通过Python脚本完成,生成的核参数与权重存储于十六进制文件中,并加载至FPGA的SRAM单元。卷积、池化、数据管理等机器学习功能均由自定义汇编语言编写的固件控制。本文记录了高层次设计框图、各System Verilog模块间的接口、软件与固件组件的实现细节,并进一步探讨了潜在影响。