Quantization-Aware Neuromorphic Architecture for Skin Disease Classification on Resource-Constrained Devices

On-device skin lesion analysis is constrained by the compute and energy cost of conventional CNN inference and by the need to update models as new patient data become available. Neuromorphic processors provide event-driven sparse computation and support on-chip incremental learning, yet deployment is often hindered by CNN-to-SNN conversion failures, including non-spike-compatible operators and accuracy degradation under class imbalance. We propose QANA, a quantization-aware CNN backbone embedded in an end-to-end pipeline engineered for conversion-stable neuromorphic execution. QANA replaces conversion-fragile components with spike-compatible transformations by bounding intermediate activations and aligning normalization with low-bit quantization, reducing conversion-induced distortion that disproportionately impacts rare classes. Efficiency is achieved through Ghost-based feature generation under tight FLOP budgets, while spatially-aware efficient channel attention and squeeze-and-excitation recalibrate channels without heavy global operators that are difficult to map to spiking cores. The resulting quantized projection head produces SNN-ready logits and enables incremental updates on edge hardware without full retraining or data offloading. On HAM10000, QANA achieves 91.6% Top-1 accuracy and 91.0% macro F1, improving the strongest converted SNN baseline by 3.5 percentage points in Top-1 accuracy (a 4.0% relative gain) and by 12.0 points in macro F1 (a 15.2% relative gain). On a clinical dataset, QANA achieves 90.8% Top-1 accuracy and 81.7% macro F1, improving the strongest converted SNN baseline by 3.2 points in Top-1 accuracy (a 3.7% relative gain) and by 3.6 points in macro F1 (a 4.6% relative gain). When deployed on BrainChip Akida, QANA runs in 1.5 ms per image with 1.7 mJ per image, corresponding to 94.6% lower latency and 99.0% lower energy than its GPU-based CNN implementation.

翻译：基于设备的皮肤病变分析受到传统CNN推理的计算与能耗成本以及模型需随新患者数据更新需求的制约。神经形态处理器提供事件驱动的稀疏计算并支持片上增量学习，但其部署常受限于CNN到SNN转换失败，包括非脉冲兼容算符及类别不平衡下的精度损失。我们提出QANA，一种嵌入端到端流程的量化感知CNN骨干网络，专为转换稳定的神经形态执行而设计。QANA通过约束中间激活值并将归一化与低位量化对齐，以脉冲兼容变换替代转换脆弱组件，从而减少对稀有类别影响尤为严重的转换失真。在严格的FLOP预算下，通过基于Ghost的特征生成实现效率提升，而空间感知高效通道注意力与挤压激励模块在不使用难以映射到脉冲核心的繁重全局算符情况下重新校准通道。最终的量化解码头生成可直接用于SNN的逻辑输出，并支持在边缘硬件上进行无需完整重训练或数据卸载的增量更新。在HAM10000数据集上，QANA取得91.6%的Top-1准确率与91.0%宏F1值，将最强转换SNN基线的Top-1准确率提升3.5个百分点（相对增益4.0%），宏F1值提升12.0点（相对增益15.2%）。在临床数据集上，QANA取得90.8%的Top-1准确率与81.7%宏F1值，将最强转换SNN基线的Top-1准确率提升3.2点（相对增益3.7%），宏F1值提升3.6点（相对增益4.6%）。部署于BrainChip Akida平台时，QANA单图处理耗时1.5毫秒，能耗1.7毫焦，相比其基于GPU的CNN实现分别降低94.6%延迟与99.0%能耗。