This paper presents a novel approach, TeFS (Temporal-controlled Frame Swap), to generate synthetic stereo driving data for visual simultaneous localization and mapping (vSLAM) tasks. TeFS is designed to overcome the lack of native stereo vision support in commercial driving simulators, and we demonstrate its effectiveness using Grand Theft Auto V (GTA V), a high-budget open-world video game engine. We introduce GTAV-TeFS, the first large-scale GTA V stereo-driving dataset, containing over 88,000 high-resolution stereo RGB image pairs, along with temporal information, GPS coordinates, camera poses, and full-resolution dense depth maps. GTAV-TeFS offers several advantages over other synthetic stereo datasets and enables the evaluation and enhancement of state-of-the-art stereo vSLAM models under GTA V's environment. We validate the quality of the stereo data collected using TeFS by conducting a comparative analysis with the conventional dual-viewport data using an open-source simulator. We also benchmark various vSLAM models using the challenging-case comparison groups included in GTAV-TeFS, revealing the distinct advantages and limitations inherent to each model. The goal of our work is to bring more high-fidelity stereo data from commercial-grade game simulators into the research domain and push the boundary of vSLAM models.
翻译:本文提出了一种新颖方法——TeFS(时间控制帧交换),用于生成针对视觉同步定位与地图构建(vSLAM)任务的合成立体驾驶数据。TeFS旨在克服商业驾驶模拟器中缺乏原生立体视觉支持的局限,并通过高预算开放世界游戏引擎《侠盗猎车手V》(GTA V)验证其有效性。我们推出了GTAV-TeFS,这是首个大规模GTA V立体驾驶数据集,包含超过88,000对高分辨率立体RGB图像,同时提供时间信息、GPS坐标、相机位姿以及全分辨率稠密深度图。相较于其他合成立体数据集,GTAV-TeFS具有多项优势,能够支持在GTA V环境下评估与优化最先进的立体vSLAM模型。我们通过使用开源模拟器与传统双视口数据进行对比分析,验证了TeFS所采集立体数据的质量。此外,我们利用GTAV-TeFS中设置的挑战性案例对比组对多种vSLAM模型进行了基准测试,揭示了各模型固有的优势与局限性。本研究旨在将商业级游戏模拟器中的高保真立体数据引入研究领域,并推动vSLAM模型性能边界的拓展。