A typical machine learning (ML) development cycle for edge computing is to maximise the performance during model training and then minimise the memory/area footprint of the trained model for deployment on edge devices targeting CPUs, GPUs, microcontrollers, or custom hardware accelerators. This paper proposes a methodology for automatically generating predictor circuits for classification of tabular data with comparable prediction performance to conventional ML techniques while using substantially fewer hardware resources and power. The proposed methodology uses an evolutionary algorithm to search over the space of logic gates and automatically generates a classifier circuit with maximised training prediction accuracy. Classifier circuits are so tiny (i.e., consisting of no more than 300 logic gates) that they are called "Tiny Classifier" circuits, and can efficiently be implemented in ASIC or on an FPGA. We empirically evaluate the automatic Tiny Classifier circuit generation methodology or "Auto Tiny Classifiers" on a wide range of tabular datasets, and compare it against conventional ML techniques such as Amazon's AutoGluon, Google's TabNet and a neural search over Multi-Layer Perceptrons. Despite Tiny Classifiers being constrained to a few hundred logic gates, we observe no statistically significant difference in prediction performance in comparison to the best-performing ML baseline. When synthesised as a Silicon chip, Tiny Classifiers use 8-18x less area and 4-8x less power. When implemented as an ultra-low cost chip on a flexible substrate (i.e., FlexIC), they occupy 10-75x less area and consume 13-75x less power compared to the most hardware-efficient ML baseline. On an FPGA, Tiny Classifiers consume 3-11x fewer resources.
翻译:典型的边缘计算机器学习(ML)开发流程是:在模型训练阶段最大化性能,随后在部署阶段最小化训练模型的内存/面积占用,以适应面向CPU、GPU、微控制器或定制硬件加速器的边缘设备。本文提出一种自动生成预测器电路的方法,用于表格数据分类,其预测性能与传统ML技术相当,但所需硬件资源与功耗显著降低。该方法采用演化算法在逻辑门空间中进行搜索,自动生成具有最大化训练预测准确率的分类器电路。这些分类器电路极为微小(即由不超过300个逻辑门组成),因此被称为"极小分类器"电路,可高效实现于ASIC或FPGA。我们在广泛的表格数据集上对自动极小分类器电路生成方法(即"Auto Tiny Classifiers")进行了实证评估,并将其与亚马逊AutoGluon、谷歌TabNet及基于神经网络搜索的多层感知机等传统ML技术进行对比。尽管极小分类器被限制在数百个逻辑门范围内,但我们观察到其预测性能与表现最佳的ML基线相比无统计显著差异。在硅芯片上实现时,极小分类器面积减少8-18倍,功耗降低4-8倍;在柔性基底(即FlexIC)上实现为超低成本芯片时,其面积减少10-75倍,功耗降低13-75倍,均优于硬件效率最高的ML基线;在FPGA上,极小分类器资源消耗减少3-11倍。