Received Signal Strength Indicator (RSSI) estimation is essential for wireless link management, yet conventional feedback-based approaches incur uplink overhead, suffer from measurement instability, and are subject to inherent feedback loop latency, rendering proactive adaptation infeasible. Although vision-based approaches have been explored, existing methods remain limited by hardware dependency or auxiliary inputs, and lack the spatial diversity needed to resolve camera-side NLoS conditions. To address these limitations, we propose MulViT-TF, a vision-only RSSI estimation framework that exploits distributed multi-view observations through Transformer-based fusion, achieving complementary spatial coverage without any auxiliary sensing inputs. Experimental results across two distinct indoor scenes demonstrate that MulViT-TF achieves RMSE reductions of up to 26.3% and improves the 3dB error coverage by up to 13.8 percentage points over the best-performing single-view baseline, while using fewer FLOPs and parameters.
翻译:接收信号强度指示(RSSI)估计对无线链路管理至关重要,但传统基于反馈的方法存在上行开销、测量不稳定以及固有反馈环路延迟等问题,导致无法实现主动适应。尽管基于视觉的方法已有探索,但现有方法仍受限于硬件依赖或辅助输入,且缺乏解决摄像头端非视距(NLoS)条件所需的空间多样性。为解决这些局限,我们提出MulViT-TF,一种基于Transformer融合的纯视觉RSSI估计框架,利用分布式多视角观测实现互补空间覆盖,无需任何辅助感知输入。在两个不同室内场景的实验结果表明,与最优单视角基线相比,MulViT-TF在均方根误差(RMSE)上降低高达26.3%,并将3dB误差覆盖率提升多达13.8个百分点,同时使用更少的浮点运算次数(FLOPs)和参数。