High-quality, multi-channel neural recording is indispensable for neuroscience research and clinical applications. Large-scale brain recordings often produce vast amounts of data that must be wirelessly transmitted for subsequent offline analysis and decoding, especially in brain-computer interfaces (BCIs) utilizing high-density intracortical recordings with hundreds or thousands of electrodes. However, transmitting raw neural data presents significant challenges due to limited communication bandwidth and resultant excessive heating. To address this challenge, we propose a neural signal compression scheme utilizing Convolutional Autoencoders (CAEs), which achieves a compression ratio of up to 150 for compressing local field potentials (LFPs). The CAE encoder section is implemented on RAMAN, an energy-efficient tinyML accelerator designed for edge computing. RAMAN leverages sparsity in activation and weights through zero skipping, gating, and weight compression techniques. Additionally, we employ hardware-software co-optimization by pruning the CAE encoder model parameters using a hardware-aware balanced stochastic pruning strategy, resolving workload imbalance issues and eliminating indexing overhead to reduce parameter storage requirements by up to 32.4%. Post layout simulation shows that the RAMAN encoder can be implemented in a TSMC 65-nm CMOS process, occupying a core area of 0.0187 mm2 per channel. Operating at a clock frequency of 2 MHz and a supply voltage of 1.2 V, the estimated power consumption is 15.1 uW per channel for the proposed DS-CAE1 model. For functional validation, the RAMAN encoder was also deployed on an Efinix Ti60 FPGA, utilizing 37.3k LUTs and 8.6k flip-flops. The compressed neural data from RAMAN is reconstructed offline with SNDR of 22.6 dB and 27.4 dB, along with R2 scores of 0.81 and 0.94, respectively, evaluated on two monkey neural recordings.
翻译:高质量、多通道神经记录对于神经科学研究和临床应用不可或缺。大规模脑记录通常会产生海量数据,这些数据必须通过无线传输进行后续离线分析与解码,尤其是在采用具有数百或数千个电极的高密度皮层内记录的脑机接口中。然而,由于通信带宽有限及由此产生的过度发热,传输原始神经数据面临重大挑战。为解决此问题,我们提出了一种利用卷积自编码器的神经信号压缩方案,该方案在压缩局部场电位时实现了高达150的压缩比。CAE编码器部分在RAMAN上实现,RAMAN是一款专为边缘计算设计的节能型tinyML加速器。RAMAN通过零值跳过、门控和权重压缩技术,利用激活与权重中的稀疏性。此外,我们采用硬件-软件协同优化,通过基于硬件感知的平衡随机剪枝策略对CAE编码器模型参数进行剪枝,解决了工作负载不平衡问题并消除了索引开销,从而将参数存储需求降低了高达32.4%。布局后仿真表明,RAMAN编码器可在TSMC 65-nm CMOS工艺中实现,每通道核心面积为0.0187 mm²。在2 MHz时钟频率和1.2 V供电电压下运行,所提出的DS-CAE1模型每通道的估计功耗为15.1 uW。为进行功能验证,RAMAN编码器亦部署于Efinix Ti60 FPGA上,使用了37.3k个LUT和8.6k个触发器。来自RAMAN的压缩神经数据在离线重建后,在两份猴子神经记录数据集上的评估结果显示,其SNDR分别为22.6 dB和27.4 dB,R²分数分别为0.81和0.94。