We present a dataset to evaluate localization algorithms, which utilizes vision, audio, and radio sensors: the Lund University Vision, Radio, and Audio (LuViRA) Dataset. The dataset includes RGB images, corresponding depth maps, IMU readings, channel response between a massive MIMO channel sounder and a user equipment, audio recorded by 12 microphones, and 0.5 mm accurate 6DoF pose ground truth. We synchronize these sensors to make sure that all data are recorded simultaneously. A camera, speaker, and transmit antenna are placed on top of a slowly moving service robot and 88 trajectories are recorded. Each trajectory includes 20 to 50 seconds of recorded sensor data and ground truth labels. The data from different sensors can be used separately or jointly to conduct localization tasks and a motion capture system is used to verify the results obtained by the localization algorithms. The main aim of this dataset is to enable research on fusing the most commonly used sensors for localization tasks. However, the full dataset or some parts of it can also be used for other research areas such as channel estimation, image classification, etc. Fusing sensor data can lead to increased localization accuracy and reliability, as well as decreased latency and power consumption. The created dataset will be made public at a later date.
翻译:我们提出了一个用于评估定位算法的数据集,该数据集融合了视觉、音频和无线电传感器:隆德大学视觉、无线电与音频(LuViRA)数据集。该数据集包含RGB图像、对应的深度图、IMU读数、大规模MIMO信道探测仪与用户设备之间的信道响应、由12个麦克风录制的音频,以及精度为0.5毫米的六自由度位姿真值。我们对这些传感器进行了同步处理,以确保所有数据同时采集。相机、扬声器和发射天线被置于一台缓慢移动的服务机器人顶部,记录了88条轨迹。每条轨迹包含20至50秒的传感器数据和真值标签。不同传感器的数据可单独或联合用于定位任务,并通过运动捕捉系统验证定位算法所得结果。该数据集的主要目标是促进最常用传感器在定位任务中的融合研究。此外,完整数据集或其部分子集也可用于信道估计、图像分类等其他研究领域。融合传感器数据可提高定位精度和可靠性,并降低延迟和功耗。该数据集将在未来公开发布。