A Unified SPD Token Transformer Framework for EEG Classification: Systematic Comparison of Geometric Embeddings

Spatial covariance matrices of EEG signals are Symmetric Positive Definite (SPD) and lie on a Riemannian manifold, yet the theoretical connection between embedding geometry and optimization dynamics remains unexplored. We provide a formal analysis linking embedding choice to gradient conditioning and numerical stability for SPD manifolds, establishing three theoretical results: (1) BWSPD's $\sqrtκ$ gradient conditioning (vs $κ$ for Log-Euclidean) via Daleckii-Kreĭn matrices provides better gradient conditioning on high-dimensional inputs ($d \geq 22$), with this advantage reducing on low-dimensional inputs ($d \leq 8$) where eigendecomposition overhead dominates; (2) Embedding-Space Batch Normalization (BN-Embed) approximates Riemannian normalization up to $O(\varepsilon^2)$ error, yielding $+26\%$ accuracy on 56-channel ERP data but negligible effect on 8-channel SSVEP data, matching the channel-count-dependent prediction; (3) bi-Lipschitz bounds prove BWSPD tokens preserve manifold distances with distortion governed solely by the condition ratio $κ$. We validate these predictions via a unified Transformer framework comparing BWSPD, Log-Euclidean, and Euclidean embeddings within identical architecture across 1,500+ runs on three EEG paradigms (motor imagery, ERP, SSVEP; 36 subjects). Our Log-Euclidean Transformer achieves state-of-the-art performance on all datasets, substantially outperforming classical Riemannian classifiers and recent SPD baselines, while BWSPD offers competitive accuracy with similar training time.

翻译：脑电信号的空间协方差矩阵属于对称正定（SPD）矩阵，并位于黎曼流形上，然而嵌入几何与优化动力学之间的理论联系尚未得到充分探索。本文通过形式化分析将SPD流形上的嵌入选择与梯度条件数及数值稳定性相关联，建立了三项理论结果：（1）基于Daleckii-Kreĭn矩阵的BWSPD嵌入具有$\sqrtκ$梯度条件数（Log-Euclidean嵌入为$κ$），在高维输入（$d \geq 22$）中展现出更优的梯度条件，而在低维输入（$d \leq 8$）中该优势因特征分解开销占主导而减弱；（2）嵌入空间批量归一化（BN-Embed）以$O(\varepsilon^2)$误差逼近黎曼归一化，在56通道ERP数据上实现$+26\%$的准确率提升，但在8通道SSVEP数据上效果可忽略，这与通道数依赖的预测相符；（3）双利普希茨边界证明BWSPD令牌能保持流形距离，其失真仅受条件数比$κ$控制。我们通过统一的Transformer框架在三种脑电范式（运动想象、ERP、SSVEP；36名受试者）上进行了1500余次实验，在相同架构中比较BWSPD、Log-Euclidean和欧几里得嵌入，验证了上述理论预测。我们的Log-Euclidean Transformer在所有数据集上均达到最先进性能，显著优于经典黎曼分类器与近期SPD基线，而BWSPD在保持相近训练时间的同时提供了具有竞争力的准确率。