Polysomnography (PSG) data is recorded and stored in various formats depending on the recording software. Although the PSG data can usually be exported to open formats, such as the European Data Format (EDF), they are limited in data types, validation, and readability. Moreover, the exported data is not harmonized, which means different datasets need customized preprocessing to conduct research on multiple datasets. In this work, we designed and implemented an open format for storage and processing of PSG data, called the Sleeplab format (SLF), which is both human and machine-readable, and has built-in validation of both data types and structures. SLF provides tools for reading, writing, and compression of the PSG datasets. In addition, SLF promotes harmonization of data from different sources, which reduces the amount of work needed to apply the same analytics pipelines to different datasets. SLF is interoperable as it utilizes the file system and commonly used file formats to store the data. The goal of developing SLF was to enable fast exploration and experimentation on PSG data, and to streamline the workflow of building analytics and machine learning applications that combine PSG data from multiple sources. The performance of SLF was tested with two open datasets of different formats (EDF and HDF5). SLF is fully open source and available at https://github.com/UEF-SmartSleepLab/sleeplab-format.
翻译:多导睡眠图(PSG)数据根据记录软件的不同,以多种格式进行存储。尽管PSG数据通常可导出为开放格式(如欧洲数据格式EDF),但这些格式在数据类型、验证和可读性方面存在局限性。此外,导出的数据缺乏统一性,这意味着不同数据集需要定制化的预处理才能进行跨数据集研究。本研究设计并实现了一种用于PSG数据存储与处理的开放格式——Sleeplab格式(SLF),该格式兼具人类可读性与机器可读性,并内置数据类型与结构的验证功能。SLF提供了PSG数据集的读写与压缩工具。同时,SLF促进了不同来源数据的统一化,从而减少了将相同分析流程应用于不同数据集所需的工作量。SLF利用文件系统和常用文件格式存储数据,具有互操作性。开发SLF的目标是支持对PSG数据进行快速探索与实验,并简化构建结合多源PSG数据的分析与机器学习应用的工作流程。我们使用两个不同格式(EDF和HDF5)的开放数据集测试了SLF的性能。SLF完全开源,可通过https://github.com/UEF-SmartSleepLab/sleeplab-format获取。