This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements that consider the most recent neurological discoveries regarding pyramidal cells of the prefrontal cortex and other brain regions. The so-called burst propagation implements several criteria to address the credit assignment problem in a more biologically plausible manner: steering the sign and magnitude of plasticity through feedback, multiplexing the feedback and feedforward information across layers through different weight connections, approximating feedback and feedforward connections, and linearizing the feedback signals. MBURST benefits from such capabilities to learn correlations between the noisy signal and the visual stimuli, thus attributing meaning to the speech by amplifying relevant information and suppressing noise. Experiments conducted over a Grid Corpus and CHiME3-based dataset show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline while demonstrating outstanding energy efficiency management, reducing the neuron firing rates to values up to \textbf{$70\%$} lower. Such a feature implies more sustainable implementations, suitable and desirable for hearing aids or any other similar embedded systems.
翻译:本文提出了一种名为MBURST的新型多模态音视频语音增强解决方案,该方案融合了关于前额叶皮层及其他脑区锥体细胞的最新神经科学发现。所谓的"爆发传播"机制通过以下策略以更符合生物学特性的方式解决信用分配问题:通过反馈调节可塑性的符号与幅度,通过不同权重连接在层间实现反馈与前馈信息的多路复用,近似反馈与前馈连接,以及线性化反馈信号。MBURST利用这些能力学习噪声信号与视觉刺激之间的相关性,通过放大相关信息并抑制噪声来为语音赋予意义。在Grid语料库和基于CHiME3的数据集上进行的实验表明:MBURST能够生成与基于反向传播的多模态基线模型相似的掩码重构结果,同时展现出卓越的能效管理能力,将神经元放电率降低至最高\textbf{70\%}的水平。这一特性意味着该方案可实现更可持续的部署,特别适用于助听器及其他类似的嵌入式系统。