A major goal in neuroscience is to discover neural data representations that generalize. This goal is challenged by variability along recording sessions (e.g. environment), subjects (e.g. varying neural structures), and sensors (e.g. sensor noise), among others. Recent work has begun to address generalization across sessions and subjects, but few study robustness to sensor failure which is highly prevalent in neuroscience experiments. In order to address these generalizability dimensions we first collect our own electroencephalography dataset with numerous sessions, subjects, and sensors, then study two time series models: EEGNet (Lawhern et al., 2018) and TOTEM (Talukder et al., 2024). EEGNet is a widely used convolutional neural network, while TOTEM is a discrete time series tokenizer and transformer model. We find that TOTEM outperforms or matches EEGNet across all generalizability cases. Finally through analysis of TOTEM's latent codebook we observe that tokenization enables generalization
翻译:神经科学的一个主要目标是发现能够泛化的神经数据表征。然而,记录会话(如环境)、受试者(如不同的神经结构)和传感器(如传感器噪声)等因素的变异性对此目标构成挑战。近期研究已开始关注跨会话和跨受试者的泛化问题,但极少涉及神经科学实验中普遍存在的传感器失效鲁棒性研究。为解决这些泛化维度问题,我们首先采集了包含大量会话、受试者和传感器自有脑电图数据集,进而研究两种时间序列模型:EEGNet(Lawhern等人,2018)和TOTEM(Talukder等人,2024)。EEGNet是广泛使用的卷积神经网络,而TOTEM是一种离散时间序列分词器与Transformer模型。我们发现TOTEM在所有泛化场景中均优于或持平EEGNet。最后通过分析TOTEM的潜在码本,我们观察到分词化过程能够增强泛化能力。