A promising pathway for restoring communication in patients with dysarthria and anarthria is speech neuroprostheses, which directly decode speech from cortical neural activity. Two benchmarks, Brain-to-Text '24 and '25, released intracranial recordings from patients with dysarthria along with a baseline algorithm trained with Connectionist Temporal Classification (CTC). Despite significant innovation on these benchmarks, all leading published prior work relies on a WFST-based CTC decoder that requires ${\sim}$320 GB of RAM. These memory requirements limit accessibility for both patients and researchers. Here, we propose LightBeam, a non-WFST based CTC decoder that requires only ${\sim}$10 GB of RAM and achieves state-of-the-art performance on both benchmarks. LightBeam achieves this by integrating an LLM into the beam-search process via delayed fusion, obviating the prior need for using a large N-gram LM. LightBeam is implemented in Python and is open-source.
翻译:语音神经假体通过直接从皮层神经活动解码语音,为构音障碍和无构音症患者恢复沟通能力提供了一条充满前景的路径。Brain-to-Text '24 和 '25 两个基准测试发布了来自构音障碍患者的颅内记录,以及一个使用连接时序分类(CTC)训练的基线算法。尽管在这些基准测试上取得了显著创新,所有已发表的领先先前工作都依赖于一种基于WFST的CTC解码器,该解码器需要约320 GB的RAM。这些内存需求限制了患者和研究人员的可及性。本文提出LightBeam,一种非基于WFST的CTC解码器,仅需约10 GB的RAM,并在两个基准测试上均实现了最先进的性能。LightBeam通过延迟融合将一个大语言模型(LLM)集成到束搜索过程中,从而避免了先前需要使用大型N-gram语言模型的需求。LightBeam使用Python实现并已开源。