1D-CNNs play a crucial role for time-series analysis on tiny smart sensor systems, e.g. for biosignal analysis, predictive maintenance, or structural health monitoring. LUTbased precomputation has emerged as an interesting optimization technique to implement such neural networks on FPGAs. The core idea is to precompute all possible outputs of a neural network layer and store them directly in the lookup tables of the FPGAs. This enables highly resource-efficient networks with ultra-low latency but suffers from poor scalability. Previous work has explored using depthwise-separable convolutions to improve scalability. In this paper, we generalize this approach to consider additional forms of grouped convolutions. Based on this, we propose a novel type of convolutional block and an algorithm to guide the choice of hyper parameters for this block. We evaluate our approach on a medical time-series dataset for predicting atrial fibrillation using the MIT-BIH database (ECG recordings). The resulting hardware accelerators are small enough to be deployed on an AMD Spartan 7 S15. They achieve a F1-Score of up to 95% while only requiring 2,844 LUTs and no DSPs or BRAM.
翻译:一维卷积神经网络在微型智能传感器系统的时间序列分析中扮演着关键角色,例如用于生物信号分析、预测性维护或结构健康监测。基于查找表的预计算已成为一种在FPGA上实现此类神经网络的有趣优化技术。其核心思想是预计算神经网络层的所有可能输出,并将其直接存储在FPGA的查找表中。这使得网络具有极高的资源效率和超低延迟,但代价是可扩展性较差。先前的研究探索了使用深度可分离卷积来改进可扩展性。在本文中,我们推广了这一方法,考虑了更多形式的分组卷积。基于此,我们提出了一种新型卷积块以及一种指导该块超参数选择的算法。我们使用MIT-BIH数据库(心电图记录)在预测房颤的医学时间序列数据集上评估了我们的方法。生成的硬件加速器足够小,可以部署在AMD Spartan 7 S15上。它们实现了高达95%的F1分数,同时仅需2,844个查找表,且无需使用数字信号处理器或块随机存储器。