Temporal credit assignment for one-shot learning utilizing a phase transition material

Alessandro R. Galloni,Yifan Yuan,Minning Zhu,Haoming Yu,Ravindra S. Bisht,Chung-Tse Michael Wu,Christine Grienberger,Shriram Ramanathan,Aaron D. Milstein

from arxiv, 37 pages, 5 figures, 6 supplementary figures

Design of hardware based on biological principles of neuronal computation and plasticity in the brain is a leading approach to realizing energy- and sample-efficient artificial intelligence and learning machines. An important factor in selection of the hardware building blocks is the identification of candidate materials with physical properties suitable to emulate the large dynamic ranges and varied timescales of neuronal signaling. Previous work has shown that the all-or-none spiking behavior of neurons can be mimicked by threshold switches utilizing phase transitions. Here we demonstrate that devices based on a prototypical metal-insulator-transition material, vanadium dioxide (VO2), can be dynamically controlled to access a continuum of intermediate resistance states. Furthermore, the timescale of their intrinsic relaxation can be configured to match a range of biologically-relevant timescales from milliseconds to seconds. We exploit these device properties to emulate three aspects of neuronal analog computation: fast (~1 ms) spiking in a neuronal soma compartment, slow (~100 ms) spiking in a dendritic compartment, and ultraslow (~1 s) biochemical signaling involved in temporal credit assignment for a recently discovered biological mechanism of one-shot learning. Simulations show that an artificial neural network using properties of VO2 devices to control an agent navigating a spatial environment can learn an efficient path to a reward in up to 4 fold fewer trials than standard methods. The phase relaxations described in our study may be engineered in a variety of materials, and can be controlled by thermal, electrical, or optical stimuli, suggesting further opportunities to emulate biological learning.

翻译：基于大脑神经元计算与可塑性生物学原理设计硬件，是实现高能效与样本高效人工智能及学习机的前沿方法。在硬件构建模块的选择中，关键因素在于识别具备适当物理特性的候选材料，以模拟神经元信号传递的大动态范围与多样时间尺度。已有研究表明，利用相变效应的阈值开关可模仿神经元全或无的放电行为。本文证明，基于原型金属-绝缘体相变材料二氧化钒（VO2）的器件可通过动态控制访问连续中间阻值状态。此外，其固有弛豫时间尺度可配置为从毫秒到秒的生物相关时间范围。我们利用这些器件特性模拟神经元模拟计算的三个层面：神经元胞体区室的快速（约1毫秒）放电、树突区室的慢速（约100毫秒）放电，以及涉及最近发现的一次性学习生物机制中时间信用分配的超慢速（约1秒）生化信号。仿真表明，利用VO2器件特性控制智能体在空间环境中导航的人工神经网络，其学习获取奖励的有效路径所需的试验次数比标准方法最多可减少4倍。本研究中描述的相弛豫现象可通过多种材料工程实现，并可通过热、电或光刺激控制，这为模拟生物学习提供了更多可能。