We present a unique comparative analysis, and evaluation of vision, radio, and audio based localization algorithms. We create the first baseline for the aforementioned sensors using the recently published Lund University Vision, Radio, and Audio (LuViRA) dataset, where all the sensors are synchronized and measured in the same environment. Some of the challenges of using each specific sensor for indoor localization tasks are highlighted. Each sensor is paired with a current state-of-the-art localization algorithm and evaluated for different aspects: localization accuracy, reliability and sensitivity to environment changes, calibration requirements, and potential system complexity. Specifically, the evaluation covers the ORB-SLAM3 algorithm for vision-based localization with an RGB-D camera, a machine-learning algorithm for radio-based localization with massive MIMO technology, and the SFS2 algorithm for audio-based localization with distributed microphones. The results can serve as a guideline and basis for further development of robust and high-precision multi-sensory localization systems, e.g., through sensor fusion, context, and environment-aware adaptation.
翻译:我们提出了一种独特的对比分析,并对基于视觉、射频和音频的定位算法进行了评估。利用近期发布的隆德大学视觉、射频与音频(LuViRA)数据集——其中所有传感器在相同环境中实现同步测量——我们首次建立了上述传感器的性能基线。研究阐明了每种特定传感器用于室内定位任务时面临的若干挑战,并为每种传感器搭配了当前最先进的定位算法,从定位精度、可靠性、对环境变化的敏感性、标定需求以及潜在系统复杂度等不同维度进行评估。具体而言,评估覆盖了基于RGB-D相机的视觉定位算法ORB-SLAM3、基于大规模MIMO技术的射频定位机器学习算法,以及基于分布式麦克风的音频定位算法SFS2。这些结果可作为开发鲁棒且高精度多传感器定位系统(例如通过传感器融合、上下文感知及环境自适应调整)的指导准则与基础依据。