This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements that consider the most recent neurological discoveries regarding pyramidal cells of the prefrontal cortex and other brain regions. The so-called burst propagation implements several criteria to address the credit assignment problem in a more biologically plausible manner: steering the sign and magnitude of plasticity through feedback, multiplexing the feedback and feedforward information across layers through different weight connections, approximating feedback and feedforward connections, and linearizing the feedback signals. MBURST benefits from such capabilities to learn correlations between the noisy signal and the visual stimuli, thus attributing meaning to the speech by amplifying relevant information and suppressing noise. Experiments conducted over a Grid Corpus and CHiME3-based dataset show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy efficiency management, reducing the neuron firing rates to values up to \textbf{$70\%$} lower. Such a feature implies more sustainable implementations, suitable and desirable for hearing aids or any other similar embedded systems.
翻译:本文提出了一种新颖的多模态音频-视觉语音增强解决方案MBURST,其设计参考了近期关于前额叶皮层锥体细胞及其他脑区的最新神经科学发现。所提出的突发传播机制通过若干准则,以更具生物学合理性的方式解决了信用分配问题:通过反馈引导可塑性的符号与幅度,通过不同的权重连接在层间复用反馈与前馈信息,近似反馈与前馈连接,并对反馈信号进行线性化处理。MBURST利用这些能力学习含噪信号与视觉刺激之间的相关性,从而通过增强相关信息并抑制噪声来赋予语音意义。在Grid Corpus和基于CHiME3的数据集上进行的实验表明,MBURST能够复现出与基于多模态反向传播的基线方法相似的掩模重建效果,同时展现出卓越的能效管理能力,将神经元放电率降低至最高达\textbf{$70\%$}的水平。这一特性意味着该方法可实现更可持续的系统部署,非常适用于助听器或任何其他类似的嵌入式系统。