Offline Learning of Closed-Loop Deep Brain Stimulation Controllers for Parkinson Disease Treatment

Deep brain stimulation (DBS) has shown great promise toward treating motor symptoms caused by Parkinson's disease (PD), by delivering electrical pulses to the Basal Ganglia (BG) region of the brain. However, DBS devices approved by the U.S. Food and Drug Administration (FDA) can only deliver continuous DBS (cDBS) stimuli at a fixed amplitude; this energy inefficient operation reduces battery lifetime of the device, cannot adapt treatment dynamically for activity, and may cause significant side-effects (e.g., gait impairment). In this work, we introduce an offline reinforcement learning (RL) framework, allowing the use of past clinical data to train an RL policy to adjust the stimulation amplitude in real time, with the goal of reducing energy use while maintaining the same level of treatment (i.e., control) efficacy as cDBS. Moreover, clinical protocols require the safety and performance of such RL controllers to be demonstrated ahead of deployments in patients. Thus, we also introduce an offline policy evaluation (OPE) method to estimate the performance of RL policies using historical data, before deploying them on patients. We evaluated our framework on four PD patients equipped with the RC+S DBS system, employing the RL controllers during monthly clinical visits, with the overall control efficacy evaluated by severity of symptoms (i.e., bradykinesia and tremor), changes in PD biomakers (i.e., local field potentials), and patient ratings. The results from clinical experiments show that our RL-based controller maintains the same level of control efficacy as cDBS, but with significantly reduced stimulation energy. Further, the OPE method is shown effective in accurately estimating and ranking the expected returns of RL controllers.

翻译：深部脑刺激（DBS）在治疗帕金森病（PD）引起的运动症状方面显示出巨大潜力，其通过向脑内基底节（BG）区域输送电脉冲实现疗效。然而，美国食品药品监督管理局（FDA）批准的DBS设备只能以固定振幅持续输送连续DBS（cDBS）刺激；这种低能效的运作方式缩短了设备电池寿命，无法根据活动动态调整治疗，并可能引起严重副作用（如步态障碍）。本研究提出一种离线强化学习（RL）框架，利用既往临床数据训练RL策略以实时调节刺激振幅，目标是在保持与cDBS同等治疗（即控制）效果的同时降低能耗。此外，临床方案要求在患者实际应用前验证此类RL控制器的安全性与性能。为此，我们同时引入离线策略评估（OPE）方法，利用历史数据在将RL策略部署于患者前预估其性能。我们在四名配备RC+S DBS系统的PD患者中进行了评估，在每月临床访视期间采用RL控制器，通过症状严重程度（如运动迟缓与震颤）、PD生物标志物变化（如局部场电位）及患者评分综合评估整体控制效果。临床试验结果表明，基于RL的控制器在保持与cDBS同等控制效果的同时，显著降低了刺激能耗。此外，OPE方法在准确估计和排序RL控制器的预期回报方面展现出有效性。