In recent years, much work has been done on processing of wireless spectrum data involving machine learning techniques in domain-related problems for cognitive radio networks, such as anomaly detection, modulation classification, technology classification and device fingerprinting. Most of the solutions are based on labeled data, created in a controlled manner and processed with supervised learning approaches. However, spectrum data measured in real-world environment is highly nondeterministic, making its labeling a laborious and expensive process, requiring domain expertise, thus being one of the main drawbacks of using supervised learning approaches in this domain. In this paper, we investigate the use of self-supervised learning (SSL) for exploring spectrum activities in a real-world unlabeled data. In particular, we compare the performance of two SSL models, one based on a reference DeepCluster architecture and one adapted for spectrum activity identification and clustering, and a baseline model based on K-means clustering algorithm. We show that SSL models achieve superior performance regarding the quality of extracted features and clustering performance. With SSL models we achieve reduction of the feature vectors size by two orders of magnitude, while improving the performance by a factor of 2 to 2.5 across the evaluation metrics, supported by visual assessment. Additionally we show that adaptation of the reference SSL architecture to the domain data provides reduction of model complexity by one order of magnitude, while preserving or even improving the clustering performance.
翻译:近年来,已有大量研究工作利用机器学习技术处理无线频谱数据,以解决认知无线电网络领域相关问题,如异常检测、调制分类、技术分类和设备指纹识别。大多数解决方案基于人工可控条件下生成的标注数据,并采用监督学习方法进行处理。然而,真实环境下实测的频谱数据具有高度非确定性,导致其标注过程耗时且成本高昂,需要领域专业知识,这成为在该领域应用监督学习方法的主要障碍之一。本文研究利用自监督学习(SSL)探索真实世界无标注数据中的频谱活动。具体而言,我们比较了两种SSL模型(一种基于参考DeepCluster架构,另一种针对频谱活动识别与聚类进行适配)以及基于K-means聚类算法的基线模型的性能。实验表明,SSL模型在提取特征质量和聚类性能方面均表现更优。采用SSL模型可在评估指标上提升2至2.5倍性能的同时,将特征向量规模缩减两个数量级(通过可视化评估验证)。此外,我们发现将参考SSL架构适配至领域数据可在保持甚至提升聚类性能的同时,将模型复杂度降低一个数量级。