Spiking neural networks (SNNs) have tremendous potential for energy-efficient neuromorphic chips due to their binary and event-driven architecture. SNNs have been primarily used in classification tasks, but limited exploration on image generation tasks. To fill the gap, we propose a Spiking-Diffusion model, which is based on the vector quantized discrete diffusion model. First, we develop a vector quantized variational autoencoder with SNNs (VQ-SVAE) to learn a discrete latent space for images. In VQ-SVAE, image features are encoded using both the spike firing rate and postsynaptic potential, and an adaptive spike generator is designed to restore embedding features in the form of spike trains. Next, we perform absorbing state diffusion in the discrete latent space and construct a spiking diffusion image decoder (SDID) with SNNs to denoise the image. Our work is the first to build the diffusion model entirely from SNN layers. Experimental results on MNIST, FMNIST, KMNIST, Letters, and Cifar10 demonstrate that Spiking-Diffusion outperforms the existing SNN-based generation model. We achieve FIDs of 37.50, 91.98, 59.23, 67.41, and 120.5 on the above datasets respectively, with reductions of 58.60\%, 18.75\%, 64.51\%, 29.75\%, and 44.88\% in FIDs compared with the state-of-art work. Our code will be available at \url{https://github.com/Arktis2022/Spiking-Diffusion}.
翻译:尖峰神经网络(SNNs)因其二值化与事件驱动架构,在节能神经形态芯片领域具有巨大潜力。目前SNNs主要用于分类任务,而在图像生成任务中的探索十分有限。为填补这一空白,我们提出基于向量量化离散扩散模型的尖峰扩散模型。首先,我们构建基于SNNs的向量量化变分自编码器(VQ-SVAE),用于学习图像的离散潜在空间。在VQ-SVAE中,同时利用尖峰发放频率和突触后电位编码图像特征,并设计自适应尖峰生成器以尖峰序列形式重构嵌入特征。其次,我们在离散潜在空间中执行吸收态扩散,并构建基于SNNs的尖峰扩散图像解码器(SDID)进行图像去噪。本工作是首个完全由SNN层构建扩散模型的研究。在MNIST、FMNIST、KMNIST、Letters和Cifar10数据集上的实验表明,尖峰扩散模型优于现有基于SNNs的生成模型。在上述数据集上分别取得37.50、91.98、59.23、67.41和120.5的FID值,相较当前最先进方法分别降低58.60%、18.75%、64.51%、29.75%和44.88%。代码将开源至\url{https://github.com/Arktis2022/Spiking-Diffusion}。