1D-CNNs play a crucial role for time-series analysis on tiny smart sensor systems, e.g. for biosignal analysis, predictive maintenance, or structural health monitoring. LUTbased precomputation has emerged as an interesting optimization technique to implement such neural networks on FPGAs. The core idea is to precompute all possible outputs of a neural network layer and store them directly in the lookup tables of the FPGAs. This enables highly resource-efficient networks with ultra-low latency but suffers from poor scalability. Previous work has explored using depthwise-separable convolutions to improve scalability. In this paper, we generalize this approach to consider additional forms of grouped convolutions. Based on this, we propose a novel type of convolutional block and an algorithm to guide the choice of hyper parameters for this block. We evaluate our approach on a medical time-series dataset for predicting atrial fibrillation using the MIT-BIH database (ECG recordings). The resulting hardware accelerators are small enough to be deployed on an AMD Spartan 7 S15. They achieve a F1-Score of up to 95% while only requiring 2,844 LUTs and no DSPs or BRAM.
翻译:一维卷积神经网络在微型智能传感器系统的时序分析中至关重要,例如生物信号分析、预测性维护或结构健康监测等场景。基于查找表的预计算技术已成为在FPGA上实现此类神经网络的一种有前景的优化策略。其核心思想是预先计算神经网络层所有可能的输出,并将其直接存储在FPGA的查找表中。该方法实现了具有超低延迟的高资源效率网络,但存在扩展性差的缺陷。已有研究通过采用深度可分离卷积来提升扩展性。本文将该方法进一步泛化,考虑了更多形式的分组卷积。基于此,我们提出了一种新型卷积模块及指导该模块超参数选择的算法。我们利用MIT-BIH数据库(心电图记录)中的医学时序数据集对房颤预测方法进行了评估。最终生成的硬件加速器体积足够小,可部署于AMD Spartan 7 S15上。该加速器在仅消耗2,844个查找表且无需数字信号处理器或块随机存储器的情况下,实现了高达95%的F1分数。