Deep learning (DL) has big-data processing capabilities that are as good, or even better, than those of humans in many real-world domains, but at the cost of high energy requirements that may be unsustainable in some applications and of errors, that, though infrequent, can be large. We hypothesise that a fundamental weakness of DL lies in its intrinsic dependence on integrate-and-fire point neurons that maximise information transmission irrespective of whether it is relevant in the current context or not. This leads to unnecessary neural firing and to the feedforward transmission of conflicting messages, which makes learning difficult and processing energy inefficient. Here we show how to circumvent these limitations by mimicking the capabilities of context-sensitive neocortical neurons that receive input from diverse sources as a context to amplify and attenuate the transmission of relevant and irrelevant information, respectively. Our results show that, in the case of audio-visual processing, nets composed of context-sensitive local processors can use video information as a context that guides audio signal processing towards the currently relevant information far more effectively and efficiently than current forms of DL.
翻译:深度学习(DL)在诸多现实领域已具备与人类相当甚至更优的大数据处理能力,但其高能耗需求可能在某些应用中难以为继,且尽管错误频率较低,却可能造成严重偏差。我们假设,深度学习的一个根本缺陷在于其本质上依赖整合-发放型点神经元——这类神经元无论当前情境是否相关,均最大化信息传输效率。这会导致不必要的神经放电和冲突信号的前馈传递,从而增加学习难度并降低信息处理能效。本文通过模拟上下文敏感的新皮层神经元能力来规避上述局限:这类神经元能接收多源输入作为情境信号,分别增强或抑制相关/无关信息的传递。实验表明,在视听处理任务中,由上下文敏感局部处理器构成的网络可利用视频信息作为情境信号,引导音频处理聚焦当前相关信息,其有效性和效率均显著优于现有深度学习模式。