UniBCI: Towards a Unified Pretrained Model for Invasive Brain-Computer Interfaces

Modeling invasive neural spike data is fundamental to advancing high-performance brain-computer interfaces (BCIs). However, existing approaches face critical challenges, including limited-scale heterogeneous data, cross-domain distribution shift, and the intrinsic spatiotemporal complexity of invasive neural signals. In this work, we propose UniBCI, a unified pretrained model for invasive Brain-Computer Interfaces. The model integrates three key components: (1) a context-conditioned spatio-temporal tokenization (CST) scheme that embeds neural signals together with metadata into a shared representation space; (2) a hierarchical Interval-Area Attention (IAA) mechanism that captures patterns of spike dynamics in slots via linear attention and locality dependencies via sliding-window attention; and (3) a scalable self-supervised masked signals reconstruction objective for learning generalizable neural representations from large-scale unlabeled data. We construct a pretraining corpus spanning multiple species, subjects, brain regions, and behavioral experiment paradigms. These heterogeneous recordings are standardize via our proposed unified normalization and tokenization. Comprehensive experiments demonstrate that UniBCI achieves SOTA performance across diverse downstream tasks while improving generalization. Moreover, the model achieves a strong balance between accuracy and efficiency, with fewer trainable parameters and lower inference latency. These results suggest that UniBCI provides a practical step toward general-purpose neural foundation models, enabling robust, scalable, and transferable representation learning for invasive neural data. The code for this paper is available at: https://anonymous.4open.science/r/UniBCI-C805.

翻译：对侵入式神经脉冲数据进行建模是推动高性能脑机接口发展的基础。然而，现有方法面临关键挑战，包括有限规模的异质性数据、跨域分布偏移以及侵入式神经信号固有的时空复杂性。在这项工作中，我们提出UniBCI，一个用于侵入式脑机接口的统一预训练模型。该模型整合了三个关键组件：(1) 一种基于上下文条件的时空标记化方案，将神经信号与元数据嵌入到共享表示空间中；(2) 一种层次化的间隔-区域注意力机制，通过线性注意力捕捉脉冲动态的时间槽模式，并通过滑动窗口注意力捕捉局部依赖性；(3) 一种可扩展的自监督掩码信号重建目标，用于从大规模无标签数据中学习可泛化的神经表示。我们构建了一个涵盖多个物种、受试者、脑区和行为实验范式的预训练语料库。这些异质性记录通过我们提出的统一归一化和标记化方法进行标准化。大量实验表明，UniBCI在多样化的下游任务中实现了最先进的性能，同时提升了泛化能力。此外，该模型在准确性与效率之间取得了良好平衡，具有更少的可训练参数和更低的推理延迟。这些结果表明，UniBCI为迈向通用神经基础模型提供了实用的一步，实现了对侵入式神经数据的鲁棒、可扩展和可迁移的表示学习。本文代码见：https://anonymous.4open.science/r/UniBCI-C805。