Recently, Deep Learning (DL) techniques have been used for User Equipment (UE) positioning. However, the key shortcomings of such models is that: i) they weigh the same attention to the entire input; ii) they are not well suited for the non-sequential data e.g., when only instantaneous Channel State Information (CSI) is available. In this context, we propose an attention-based Vision Transformer (ViT) architecture that focuses on the Angle Delay Profile (ADP) from CSI matrix. Our approach, validated on the `DeepMIMO' and `ViWi' ray-tracing datasets, achieves an Root Mean Squared Error (RMSE) of 0.55m indoors, 13.59m outdoors in DeepMIMO, and 3.45m in ViWi's outdoor blockage scenario. The proposed scheme outperforms state-of-the-art schemes by $\sim$ 38\%. It also performs substantially better than other approaches that we have considered in terms of the distribution of error distance.
翻译:近年来,深度学习技术已被应用于用户设备定位。然而,此类模型的关键缺陷在于:i) 它们对整体输入赋予相同的注意力权重;ii) 它们不适用于非序列数据,例如仅存在瞬时信道状态信息的情况。在此背景下,我们提出了一种基于注意力的视觉变换器架构,该架构聚焦于从CSI矩阵中提取的角度延迟谱。我们的方法在`DeepMIMO`和`ViWi`射线追踪数据集上进行了验证,在DeepMIMO中实现了室内0.55米、室外13.59米的均方根误差,在ViWi的室外遮挡场景中实现了3.45米的均方根误差。所提方案优于现有最优方案约38%。在误差距离分布方面,其性能也显著优于我们考虑的其他方法。