Physics-informed and Unsupervised Riemannian Domain Adaptation for Machine Learning on Heterogeneous EEG Datasets

Combining electroencephalogram (EEG) datasets for supervised machine learning (ML) is challenging due to session, subject, and device variability. ML algorithms typically require identical features at train and test time, complicating analysis due to varying sensor numbers and positions across datasets. Simple channel selection discards valuable data, leading to poorer performance, especially with datasets sharing few channels. To address this, we propose an unsupervised approach leveraging EEG signal physics. We map EEG channels to fixed positions using field interpolation, facilitating source-free domain adaptation. Leveraging Riemannian geometry classification pipelines and transfer learning steps, our method demonstrates robust performance in brain-computer interface (BCI) tasks and potential biomarker applications. Comparative analysis against a statistical-based approach known as Dimensionality Transcending, a signal-based imputation called ComImp, source-dependent methods, as well as common channel selection and spherical spline interpolation, was conducted with leave-one-dataset-out validation on six public BCI datasets for a right-hand/left-hand classification task. Numerical experiments show that in the presence of few shared channels in train and test, the field interpolation consistently outperforms other methods, demonstrating enhanced classification performance across all datasets. When more channels are shared, field interpolation was found to be competitive with other methods and faster to compute than source-dependent methods.

翻译：结合不同脑电图（EEG）数据集进行监督机器学习（ML）面临挑战，主要源于会话、受试者及设备间的差异性。机器学习算法通常要求训练与测试阶段具有完全相同的特征，而不同数据集中传感器数量与位置的差异使分析变得复杂。简单的通道选择方法会丢弃有价值的数据，导致性能下降，尤其在数据集间共享通道较少时更为明显。为解决这一问题，我们提出一种利用脑电信号物理特性的无监督方法。通过场插值技术将脑电通道映射至固定位置，实现了无需源数据的域适配。结合黎曼几何分类流程与迁移学习步骤，本方法在脑机接口（BCI）任务及潜在生物标志物应用中展现出鲁棒性能。我们在六个公开BCI数据集上采用留一数据集验证策略，针对右手/左手分类任务，将所提方法与基于统计学的维度跨越方法、基于信号的插补方法ComImp、依赖源数据的方法，以及常用的通道选择与球面样条插值进行了对比分析。数值实验表明，在训练与测试数据共享通道较少的情况下，场插值方法始终优于其他方法，在所有数据集中均表现出更强的分类性能。当共享通道较多时，场插值方法与其他方法性能相当，且计算速度优于依赖源数据的方法。