AASM guidelines are the result of decades of efforts aiming at standardizing sleep scoring procedure, with the final goal of sharing a worldwide common methodology. The guidelines cover several aspects from the technical/digital specifications,e.g., recommended EEG derivations, to detailed sleep scoring rules accordingly to age. Automated sleep scoring systems have always largely exploited the standards as fundamental guidelines. In this context, deep learning has demonstrated better performance compared to classical machine learning. Our present work shows that a deep learning based sleep scoring algorithm may not need to fully exploit the clinical knowledge or to strictly adhere to the AASM guidelines. Specifically, we demonstrate that U-Sleep, a state-of-the-art sleep scoring algorithm, can be strong enough to solve the scoring task even using clinically non-recommended or non-conventional derivations, and with no need to exploit information about the chronological age of the subjects. We finally strengthen a well-known finding that using data from multiple data centers always results in a better performing model compared with training on a single cohort. Indeed, we show that this latter statement is still valid even by increasing the size and the heterogeneity of the single data cohort. In all our experiments we used 28528 polysomnography studies from 13 different clinical studies.
翻译:AASM 指南是数十年致力于标准化睡眠评分流程的成果,其最终目标是共享全球通用的方法论。该指南涵盖多个方面,从技术/数字规范(例如推荐的 EEG 导联)到按年龄划分的详细睡眠评分规则。自动化睡眠评分系统始终广泛利用这些标准作为基本原则。在此背景下,深度学习已展现出优于经典机器学习的性能。我们的现有研究表明,基于深度学习的睡眠评分算法可能无需完全利用临床知识或严格遵循 AASM 指南。具体而言,我们证明了 U-Sleep(一种先进的睡眠评分算法)即使使用临床不推荐或非常规导联,且无需利用受试者实际年龄信息,也能足够强大地解决评分任务。我们最终强化了一个已知发现:使用多个数据中心的训练数据总比在单一队列上训练能获得性能更优的模型。事实上,我们表明即使增加单一数据队列的规模与异质性,这一论断依然成立。在所有实验中,我们使用了来自 13 项不同临床研究的 28528 项多导睡眠图研究数据。