Atrial fibrillation (AF) is one of the most common arrhythmias with challenging public health implications. Therefore, automatic detection of AF episodes on ECG is one of the essential tasks in biomedical engineering. In this paper, we applied the recently introduced method of compressor-based text classification with gzip algorithm for AF detection (binary classification between heart rhythms). We investigated the normalized compression distance applied to RR-interval and $\Delta$RR-interval sequences ($\Delta$RR-interval is the difference between subsequent RR-intervals). Here, the configuration of the k-nearest neighbour classifier, an optimal window length, and the choice of data types for compression were analyzed. We achieved good classification results while learning on the full MIT-BIH Atrial Fibrillation database, close to the best specialized AF detection algorithms (avg. sensitivity = 97.1\%, avg. specificity = 91.7\%, best sensitivity of 99.8\%, best specificity of 97.6\% with fivefold cross-validation). In addition, we evaluated the classification performance under the few-shot learning setting. Our results suggest that gzip compression-based classification, originally proposed for texts, is suitable for biomedical data and quantized continuous stochastic sequences in general.
翻译:心房颤动(AF)是最常见的心律失常之一,具有重大的公共卫生挑战。因此,在心电图上自动检测房颤发作是生物医学工程中的关键任务之一。本文采用近期提出的基于压缩器的文本分类方法(结合gzip算法)进行房颤检测(即心律间的二分类)。我们研究了应用于RR间期和ΔRR间期序列(ΔRR间期为相邻RR间期之差)的归一化压缩距离。在此过程中,分析了k近邻分类器的配置、最优窗口长度以及压缩数据类型的选取。在完整MIT-BIH心房颤动数据库上进行学习时,我们获得了良好的分类结果,接近最优的特异性房颤检测算法(五折交叉验证下:平均灵敏度=97.1%,平均特异性=91.7%,最佳灵敏度=99.8%,最佳特异性=97.6%)。此外,我们评估了少样本学习场景下的分类性能。研究结果表明,最初为文本提出的基于gzip压缩的分类方法适用于生物医学数据及一般性的量化连续随机序列。