Long Short-Term Memory (LSTM) neural networks have penetrated healthcare applications where real-time requirements and edge computing capabilities are essential. Gait analysis that detects abnormal steps to prevent patients from falling is a prominent problem for such applications. Given the extremely stringent design requirements in performance, power dissipation, and area, an Application-Specific Integrated Circuit (ASIC) enables an efficient real-time exploitation of LSTMs for gait analysis, achieving high accuracy. To the best of our knowledge, this work presents the first cross-layer co-optimized LSTM accelerator for real-time gait analysis, targeting an ASIC design. We conduct a comprehensive design space exploration from software down to layout design. We carry out a bit-width optimization at the software level with hardware-aware quantization to reduce the hardware complexity, explore various designs at the register-transfer level, and generate alternative layouts to find efficient realizations of the LSTM accelerator in terms of hardware complexity and accuracy. The physical synthesis results show that, using the 65 nm technology, the die size of the accelerator's layout optimized for the highest accuracy is 0.325 mm^2, while the alternative design optimized for hardware complexity with a slightly lower accuracy occupies 15.4% smaller area. Moreover, the designed accelerators achieve accurate gait abnormality detection 4.05x faster than the given application requirement.
翻译:长短期记忆(LSTM)神经网络已渗透至对实时性和边缘计算能力至关重要的医疗应用领域。检测异常步态以预防患者跌倒的步态分析是此类应用中的典型问题。鉴于对性能、功耗和面积的极端严苛设计需求,专用集成电路(ASIC)能够高效实现LSTM在步态分析中的实时运用,并达到高精确度。据我们所知,本工作首次提出面向实时步态分析的跨层协同优化LSTM加速器,以ASIC设计为目标。我们从软件层至布局设计层开展了全面的设计空间探索:在软件层通过硬件感知量化进行位宽优化以降低硬件复杂度,在寄存器传输级探索多种设计方案,并生成备选布局以寻找硬件复杂度与精确度方面高效的LSTM加速器实现方案。物理综合结果显示,采用65nm工艺,针对最高精确度优化的加速器布局芯片面积为0.325mm²,而针对硬件复杂度优化、精确度略低的备选设计面积减小15.4%。此外,所设计加速器能以比给定应用需求快4.05倍的速度实现精确的步态异常检测。