Machine learning techniques have shown remarkable accuracy in localization tasks, but their dependency on vast amounts of labeled data, particularly Channel State Information (CSI) and corresponding coordinates, remains a bottleneck. Self-supervised learning techniques alleviate the need for labeled data, a potential that remains largely untapped and underexplored in existing research. Addressing this gap, we propose a pioneering approach that leverages self-supervised pretraining on unlabeled data to boost the performance of supervised learning for user localization based on CSI. We introduce two pretraining Auto Encoder (AE) models employing Multi Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) to glean representations from unlabeled data via self-supervised learning. Following this, we utilize the encoder portion of the AE models to extract relevant features from labeled data, and finetune an MLP-based Position Estimation Model to accurately deduce user locations. Our experimentation on the CTW-2020 dataset, which features a substantial volume of unlabeled data but limited labeled samples, demonstrates the viability of our approach. Notably, the dataset covers a vast area spanning over 646x943x41 meters, and our approach demonstrates promising results even for such expansive localization tasks.
翻译:机器学习技术在定位任务中展现出卓越的准确性,但其对大量标注数据——特别是信道状态信息(CSI)及其对应坐标——的依赖仍是关键瓶颈。自监督学习方法能够缓解对标注数据的需求,但这一潜力在现有研究中尚未得到充分开发与探索。针对这一空白,我们提出一种开创性方法,利用未标注数据进行自监督预训练,以提升基于CSI的用户定位监督学习性能。我们引入两种采用多层感知机(MLP)和卷积神经网络(CNN)的预训练自编码器(AE)模型,通过自监督学习从未标注数据中提取表征。随后,利用自编码器模型的编码器部分从标注数据中提取相关特征,并微调基于MLP的位置估计模型以准确推断用户位置。我们在CTW-2020数据集上的实验验证了该方法的可行性,该数据集包含大量未标注数据但仅有少量标注样本。值得注意的是,该数据集覆盖了646×943×41米的广阔区域,即使对于如此大范围的定位任务,我们的方法依然展现出有前景的结果。