On-edge machine learning (ML) often strives to maximize the intelligence of small models while miniaturizing the circuit size and power needed to perform inference. Meeting these needs, differentiable Logic Gate Networks (LGN) have demonstrated nanosecond-scale prediction speeds while reducing the required resources as compares to traditional binary neural networks. Despite these benefits, the trade-offs between LGN parameters and resulting hardware synthesis characteristics are not well characterized. This paper therefore studies the tradeoffs between power, resource utilization, inference speed, and model accuracy when varying the depth and width of LGNs synthesized for Field Programmable Gate Arrays (FPGA). Results reveal that the final layer of an LGN is critical to minimize timing and resource usage (i.e. 28\% decrease), as this layer dictates the logic size of summing operations. Subject to timing and routing constraints, deeper and wider LGNs can be synthesized for FPGA when the final layer is narrow. Further tradeoffs are presented to help ML engineers select baseline LGN architectures for FPGAs with a set number of Look Up Tables (LUT).
翻译:边缘机器学习常致力于在最小化推理电路尺寸与功耗的同时,最大化小型模型的智能水平。为满足这些需求,可微分逻辑门网络相比传统二值神经网络,在减少所需资源的同时实现了纳秒级预测速度。尽管具备这些优势,但逻辑门网络参数与其硬件综合特征之间的权衡关系尚未得到充分表征。本文因此研究了面向现场可编程门阵列综合的LGN在改变深度与宽度时,功耗、资源利用率、推理速度与模型精度之间的权衡关系。结果表明,LGN的最终层对于优化时序与资源使用(即减少28%)至关重要,因为该层决定了求和运算的逻辑规模。在满足时序与布线约束的前提下,当最终层较窄时,可为FPGA综合出更深更宽的LGN。本文进一步呈现了其他权衡关系,以帮助机器学习工程师为具备固定查找表数量的FPGA选择基线LGN架构。