Cross-Species Transfer Learning for Electrophysiology-to-Transcriptomics Mapping in Cortical GABAergic Interneurons

Single-cell electrophysiological recordings provide a powerful window into neuronal functional diversity and offer an interpretable route for linking intrinsic physiology to transcriptomic identity. Here, we replicate and extend the electrophysiology-to-transcriptomics framework introduced by Gouwens et al. (2020) using publicly available Allen Institute Patch-seq datasets from both mouse and human cortex. We focus on GABAergic inhibitory interneurons to target a subclass structure (Lamp5, Pvalb, Sst, Vip) that is comparable and conserved across species. After quality control, we analyzed 3,699 mouse visual cortex neurons and 506 human neocortical neurons from neurosurgical resections. Using standardized electrophysiological features and sparse PCA, we reproduced the major class-level separations reported in the original mouse study. For supervised prediction, a class-balanced random forest provided a strong feature-engineered baseline in mouse data and a reduced but still informative baseline in human data. We then developed an attention-based BiLSTM that operates directly on the structured IPFX feature-family representation, avoiding sPCA and providing feature-family-level interpretability via learned attention weights. Finally, we evaluated a cross-species transfer setting in which the sequence model is pretrained on mouse data and fine-tuned on human data for an aligned 4-class task, improving human macro-F1 relative to a human-only training baseline. Together, these results confirm reproducibility of the Gouwens pipeline in mouse data, demonstrate that sequence models can match feature-engineered baselines, and show that mouse-to-human transfer learning can provide measurable gains for human subclass prediction.

翻译：单细胞电生理记录为神经元功能多样性提供了有力的观测窗口，并为连接内在生理特性与转录组身份提供了一条可解释的路径。本研究利用艾伦研究所公开的小鼠与人类皮层Patch-seq数据集，复现并拓展了Gouwens等人（2020）提出的电生理-转录组学框架。我们聚焦于GABA能抑制性中间神经元，以研究跨物种间具有可比性且保守的亚类结构（Lamp5、Pvalb、Sst、Vip）。经过质量控制，我们分析了来自小鼠视觉皮层的3,699个神经元以及神经外科切除获取的506个人类新皮层神经元。通过标准化电生理特征与稀疏PCA，我们复现了原始小鼠研究中报告的主要类别层级分离结果。在监督预测方面，类别平衡随机森林在小鼠数据中提供了强力的特征工程基线，在人类数据中虽有所降低但仍保持信息量。随后，我们开发了一种基于注意力机制的双向LSTM模型，该模型直接对结构化IPFX特征族表示进行操作，避免了稀疏PCA处理，并通过学习得到的注意力权重提供特征族层级的可解释性。最后，我们评估了跨物种迁移学习场景：序列模型在小鼠数据上进行预训练，随后在人类数据上针对对齐的四分类任务进行微调，相较于仅使用人类数据训练的基线，该策略提升了人类数据的宏观F1分数。综上所述，这些结果证实了Gouwens流程在小鼠数据中的可复现性，证明了序列模型能够匹配特征工程基线的性能，并表明从小鼠到人类的迁移学习能够为人类神经元亚类预测带来可量化的性能提升。