In the Machine Learning (ML) literature, a well-known problem is the Dataset Shift problem where, differently from the ML standard hypothesis, the data in the training and test sets can follow different probability distributions, leading ML systems toward poor generalisation performances. This problem is intensely felt in the Brain-Computer Interface (BCI) context, where bio-signals as Electroencephalographic (EEG) are often used. In fact, EEG signals are highly non-stationary both over time and between different subjects. To overcome this problem, several proposed solutions are based on recent transfer learning approaches such as Domain Adaption (DA). In several cases, however, the actual causes of the improvements remain ambiguous. This paper focuses on the impact of data normalisation, or standardisation strategies applied together with DA methods. In particular, using \textit{SEED}, \textit{DEAP}, and \textit{BCI Competition IV 2a} EEG datasets, we experimentally evaluated the impact of different normalization strategies applied with and without several well-known DA methods, comparing the obtained performances. It results that the choice of the normalisation strategy plays a key role on the classifier performances in DA scenarios, and interestingly, in several cases, the use of only an appropriate normalisation schema outperforms the DA technique.
翻译:在机器学习文献中,数据集偏移是一个众所周知的问题——与标准假设不同,训练集和测试集的数据可能遵循不同的概率分布,导致机器学习系统泛化性能不佳。这一问题在脑机接口领域尤为突出,其中常使用脑电图等生物信号。事实上,脑电信号在时间维度和不同受试者之间均表现出高度非平稳性。为解决该问题,现有多种方案基于迁移学习方法,如领域自适应。然而在许多案例中,性能提升的实际原因仍不明确。本文聚焦于数据标准化(即与领域自适应方法配合使用的标准化策略)的影响。具体而言,我们采用SEED、DEAP和BCI Competition IV 2a三个脑电数据集,系统评估了不同标准化策略在配合与不配合多种经典领域自适应方法时的性能差异。结果表明,标准化策略的选择在领域自适应场景中对分类器性能具有关键影响;有趣的是,在某些情况下,仅采用恰当的标准化方案即可取得优于领域自适应方法的效果。