Tiny Classifier Circuits: Evolving Accelerators for Tabular Data

A typical machine learning (ML) development cycle for edge computing is to maximise the performance during model training and then minimise the memory/area footprint of the trained model for deployment on edge devices targeting CPUs, GPUs, microcontrollers, or custom hardware accelerators. This paper proposes a methodology for automatically generating predictor circuits for classification of tabular data with comparable prediction performance to conventional ML techniques while using substantially fewer hardware resources and power. The proposed methodology uses an evolutionary algorithm to search over the space of logic gates and automatically generates a classifier circuit with maximised training prediction accuracy. Classifier circuits are so tiny (i.e., consisting of no more than 300 logic gates) that they are called "Tiny Classifier" circuits, and can efficiently be implemented in ASIC or on an FPGA. We empirically evaluate the automatic Tiny Classifier circuit generation methodology or "Auto Tiny Classifiers" on a wide range of tabular datasets, and compare it against conventional ML techniques such as Amazon's AutoGluon, Google's TabNet and a neural search over Multi-Layer Perceptrons. Despite Tiny Classifiers being constrained to a few hundred logic gates, we observe no statistically significant difference in prediction performance in comparison to the best-performing ML baseline. When synthesised as a Silicon chip, Tiny Classifiers use 8-56x less area and 4-22x less power. When implemented as an ultra-low cost chip on a flexible substrate (i.e., FlexIC), they occupy 10-75x less area and consume 13-75x less power compared to the most hardware-efficient ML baseline. On an FPGA, Tiny Classifiers consume 3-11x fewer resources.

翻译：典型的边缘计算机器学习（ML）开发流程是：在模型训练阶段最大化性能，随后最小化训练模型的内存/面积占用，以部署到面向CPU、GPU、微控制器或定制硬件加速器的边缘设备上。本文提出了一种自动生成分类器电路的方法，用于表格数据的分类任务，在保持与传统机器学习技术相当预测性能的同时，大幅减少硬件资源消耗和功耗。该方法采用进化算法在逻辑门空间中进行搜索，自动生成具有最大化训练预测准确率的分类器电路。这些分类器电路极为微小（即不超过300个逻辑门），因此被称为"微型分类器"电路，可高效实现于ASIC或FPGA上。我们在广泛的表格数据集上对自动微型分类器电路生成方法（即"自动微型分类器"）进行实证评估，并将其与传统机器学习技术（如Amazon的AutoGluon、Google的TabNet以及基于多层感知器的神经搜索）进行比较。尽管微型分类器仅受限于数百个逻辑门，但据观察，其预测性能与性能最佳的机器学习基线方法相比无统计学显著差异。当制成硅芯片时，微型分类器的面积减少8-56倍，功耗降低4-22倍；当在柔性基板（即FlexIC）上实现为超低成本芯片时，其面积和功耗分别比硬件效率最高的机器学习基线方法减少10-75倍和13-75倍；在FPGA上，微型分类器消耗的资源减少3-11倍。