This paper implements a Binary Neural Network (BNN) based YOLOv3-tiny-like object detector on a low-cost FPGA. The network takes 320*320*3 RGB images as input. Its main convolution layers use 1-bit weights and 8-bit activations, while Conv1 and the final detection head use fixed-point standard convolutions. From the trained ONNX model, weights, biases, and quantization parameters are extracted, converted to fixed point, packed into COE files, and stored in Vivado BRAM ROMs. The hardware is written fully in Verilog RTL and includes padding, line buffering, binary convolution, quantization post-processing, max pooling, and detection-head computation. For layers where Mul_prev is indexed by input channel and Div_current by output channel, Mul_prev is fused in-to the BNN PE so that channel-wise compensation is applied during accumulation. On VOC, the model obtains 39.6% mAP50 with 0.098 GFLOPs and 0.74 M parameters. RTL simulation shows that the final raw detection output reaches a correlation coefficient of 0.999964 and a mean absolute error of 0.020027 against the corresponding ONNX node.
翻译:本文在低成本FPGA上实现了一种基于二值神经网络(BNN)的类YOLOv3-tiny目标检测器。网络输入为320×320×3的RGB图像,其主要卷积层采用1比特权重与8比特激活,而Conv1及最终检测头则使用定点标准卷积。从训练好的ONNX模型中提取权重、偏置和量化参数,转换为定点数并打包为COE文件,存储于Vivado BRAM ROM中。硬件全部采用Verilog RTL编写,包含填充(padding)、行缓冲(line buffering)、二值卷积、量化后处理、最大池化以及检测头计算。对于按输入通道索引Mul_prev、按输出通道索引Div_current的层,将Mul_prev融合至BNN处理单元(PE),从而在累加过程中施加逐通道补偿。在VOC数据集上,该模型以0.098 GFLOPs和0.74M参数量达到了39.6%的mAP50。RTL仿真显示,最终原始检测输出与对应ONNX节点的相关系数达0.999964,平均绝对误差为0.020027。