Collecting real-world data is often considered the bottleneck of Artificial Intelligence, stalling the research progress in several fields, one of which is camera localization. End-to-end camera localization methods are still outperformed by traditional methods, and we argue that the inconsistencies associated with the data collection techniques are restraining the potential of end-to-end methods. Inspired by the recent data-centric paradigm, we propose a framework that synthesizes large localization datasets based on realistic 3D reconstructions of the real world. Our framework, termed Synfeal: Synthetic from Real, is an open-source, data-driven simulator that synthesizes RGB images by moving a virtual camera through a realistic 3D textured mesh, while collecting the corresponding ground-truth camera poses. The results validate that the training of camera localization algorithms on datasets generated by Synfeal leads to better results when compared to datasets generated by state-of-the-art methods. Using Synfeal, we conducted the first analysis of the relationship between the size of the dataset and the performance of camera localization algorithms. Results show that the performance significantly increases with the dataset size. Our results also suggest that when a large localization dataset with high quality is available, training from scratch leads to better performances. Synfeal is publicly available at https://github.com/DanielCoelho112/synfeal.
翻译:收集真实世界数据常被视为人工智能发展的瓶颈,这阻碍了包括相机定位在内的多个领域的研究进展。端到端相机定位方法的性能仍不及传统方法,我们认为与数据采集技术相关的不一致性限制了端到端方法的潜力。受近期以数据为中心的范式启发,我们提出了一种基于真实世界三维重建合成大规模定位数据集的框架。该框架名为Synfeal(从真实到合成),是一个开源的数据驱动模拟器,通过将虚拟相机在真实纹理三维网格中移动来合成RGB图像,同时采集相应的真实相机位姿。实验结果表明,与使用现有最先进方法生成的训练数据集相比,在Synfeal生成的数据集上训练相机定位算法可获得更优结果。利用Synfeal,我们首次分析了数据集规模与相机定位算法性能之间的关系。结果显示,算法性能随数据集规模增大而显著提升。我们的结果还表明,当拥有大规模高质量定位数据集时,从头训练能取得更优性能。Synfeal已开源发布于https://github.com/DanielCoelho112/synfeal。