This paper presents a novel approach, TeFS (Temporal-controlled Frame Swap), to generate synthetic stereo driving data for visual simultaneous localization and mapping (vSLAM) tasks. TeFS is designed to overcome the lack of native stereo vision support in commercial driving simulators, and we demonstrate its effectiveness using Grand Theft Auto V (GTA V), a high-budget open-world video game engine. We introduce GTAV-TeFS, the first large-scale GTA V stereo-driving dataset, containing over 88,000 high-resolution stereo RGB image pairs, along with temporal information, GPS coordinates, camera poses, and full-resolution dense depth maps. GTAV-TeFS offers several advantages over other synthetic stereo datasets and enables the evaluation and enhancement of state-of-the-art stereo vSLAM models under GTA V's environment. We validate the quality of the stereo data collected using TeFS by conducting a comparative analysis with the conventional dual-viewport data using an open-source simulator. We also benchmark various vSLAM models using the challenging-case comparison groups included in GTAV-TeFS, revealing the distinct advantages and limitations inherent to each model. The goal of our work is to bring more high-fidelity stereo data from commercial-grade game simulators into the research domain and push the boundary of vSLAM models.
翻译:本文提出了一种新颖方法——时控帧交换(TeFS),用于生成合成立体驾驶数据,以支持视觉同时定位与地图构建(vSLAM)任务。TeFS旨在克服商业驾驶模拟器缺乏原生立体视觉支持的问题,并通过使用高预算开放世界视频游戏引擎《侠盗猎车手V》(GTA V)验证其有效性。我们推出了首个大规模GTA V立体驾驶数据集GTAV-TeFS,包含超过88,000张高分辨率立体RGB图像对,同时提供时间信息、GPS坐标、相机位姿及全分辨率密集深度图。与其它合成立体数据集相比,GTAV-TeFS具有多项优势,能够在GTA V环境下评估和增强最先进的立体vSLAM模型。我们通过使用开源模拟器进行传统双视口数据对比分析,验证了TeFS所采集立体数据的质量。同时,利用GTAV-TeFS中具有挑战性的案例对比组,对多种vSLAM模型进行了基准测试,揭示了各模型的内在优势与局限性。本研究旨在将更多来自商业级游戏模拟器的高保真立体数据引入研究领域,并推动vSLAM模型的边界突破。