The Audio-Visual BatVision Dataset for Research on Sight and Sound

Vision research showed remarkable success in understanding our world, propelled by datasets of images and videos. Sensor data from radar, LiDAR and cameras supports research in robotics and autonomous driving for at least a decade. However, while visual sensors may fail in some conditions, sound has recently shown potential to complement sensor data. Simulated room impulse responses (RIR) in 3D apartment-models became a benchmark dataset for the community, fostering a range of audiovisual research. In simulation, depth is predictable from sound, by learning bat-like perception with a neural network. Concurrently, the same was achieved in reality by using RGB-D images and echoes of chirping sounds. Biomimicking bat perception is an exciting new direction but needs dedicated datasets to explore the potential. Therefore, we collected the BatVision dataset to provide large-scale echoes in complex real-world scenes to the community. We equipped a robot with a speaker to emit chirps and a binaural microphone to record their echoes. Synchronized RGB-D images from the same perspective provide visual labels of traversed spaces. We sampled modern US office spaces to historic French university grounds, indoor and outdoor with large architectural variety. This dataset will allow research on robot echolocation, general audio-visual tasks and sound phaenomena unavailable in simulated data. We show promising results for audio-only depth prediction and show how state-of-the-art work developed for simulated data can also succeed on our dataset. The data can be downloaded at https://forms.gle/W6xtshMgoXGZDwsE7

翻译：视觉研究通过图像和视频数据集在理解世界方面取得了显著成功。雷达、激光雷达和摄像头等传感器数据支撑机器人与自动驾驶研究已超过十年。然而，当视觉传感器在某些条件下失效时，声音最近展现出补充传感器数据的潜力。基于3D公寓模型模拟的房间脉冲响应（RIR）已成为该领域的基准数据集，推动了一系列视听研究。在仿真环境中，通过神经网络学习蝙蝠式感知，可从声音预测深度。与此同时，实际场景中也通过使用RGB-D图像和啁啾声的回声实现了相同效果。仿生蝙蝠感知是一个令人兴奋的新方向，但需要专用数据集来探索其潜力。为此，我们收集了蝙蝠视觉（BatVision）数据集，为复杂真实场景中大规模回声研究提供支持。我们为机器人配备扬声器发射啁啾声，并搭载双耳麦克风记录回声。同步采集的同一视角RGB-D图像提供了所遍历空间的视觉标签。我们采样了从现代美国办公空间到历史法国大学园区、涵盖室内外丰富建筑类型的场景。该数据集将支持机器人回声定位、通用视听任务以及模拟数据无法获得的声学现象研究。我们展示了仅凭音频进行深度预测的初步成果，并证明了为模拟数据开发的最先进方法也能在该数据集上成功应用。数据下载链接：https://forms.gle/W6xtshMgoXGZDwsE7

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日