We present a dataset to evaluate localization algorithms, which utilizes vision, audio, and radio sensors: the Lund University Vision, Radio, and Audio (LuViRA) Dataset. The dataset includes RGB images, corresponding depth maps, IMU readings, channel response between a massive MIMO channel sounder and a user equipment, audio recorded by 12 microphones, and 0.5 mm accurate 6DoF pose ground truth. We synchronize these sensors to make sure that all data are recorded simultaneously. A camera, speaker, and transmit antenna are placed on top of a slowly moving service robot and 88 trajectories are recorded. Each trajectory includes 20 to 50 seconds of recorded sensor data and ground truth labels. The data from different sensors can be used separately or jointly to conduct localization tasks and a motion capture system is used to verify the results obtained by the localization algorithms. The main aim of this dataset is to enable research on fusing the most commonly used sensors for localization tasks. However, the full dataset or some parts of it can also be used for other research areas such as channel estimation, image classification, etc. Fusing sensor data can lead to increased localization accuracy and reliability, as well as decreased latency and power consumption. The created dataset will be made public at a later date.
翻译:我们提出一个用于评估定位算法的数据集,该数据集融合了视觉、音频与无线电传感器:隆德大学视觉、无线电与音频(LuViRA)数据集。该数据集包含RGB图像、对应的深度图、IMU读数、大规模MIMO信道探测仪与用户设备间的信道响应、12个麦克风录制的音频,以及精度为0.5毫米的六自由度位姿真值。我们通过传感器同步确保所有数据被同时记录。摄像头、扬声器与发射天线被置于缓慢移动的服务机器人顶部,共记录88条轨迹。每条轨迹包含20至50秒的传感器数据及其对应的标签真值。不同传感器的数据可独立或联合用于定位任务,并使用运动捕捉系统验证定位算法所得结果。该数据集的主要目标是推动定位任务中最常用传感器融合的研究。然而,完整数据集或其子集也可用于其他研究领域,如信道估计、图像分类等。传感器数据融合可提升定位精度与可靠性,同时降低延迟与功耗。构建的数据集将在后续日期公开。