With the continuous improvement of computing power and deep learning algorithms in recent years, the foundation model has grown in popularity. Because of its powerful capabilities and excellent performance, this technology is being adopted and applied by an increasing number of industries. In the intelligent transportation industry, artificial intelligence faces the following typical challenges: few shots, poor generalization, and a lack of multi-modal techniques. Foundation model technology can significantly alleviate the aforementioned issues. To address these, we designed the 1st Foundation Model Challenge, with the goal of increasing the popularity of foundation model technology in traffic scenarios and promoting the rapid development of the intelligent transportation industry. The challenge is divided into two tracks: all-in-one and cross-modal image retrieval. Furthermore, we provide a new baseline and benchmark for the two tracks, called Open-TransMind. According to our knowledge, Open-TransMind is the first open-source transportation foundation model with multi-task and multi-modal capabilities. Simultaneously, Open-TransMind can achieve state-of-the-art performance on detection, classification, and segmentation datasets of traffic scenarios. Our source code is available at https://github.com/Traffic-X/Open-TransMind.
翻译:近年来,随着计算能力与深度学习算法的持续提升,基础模型日益受到关注。凭借其强大的能力和卓越的性能,该技术正被越来越多的行业采纳与应用。在智能交通领域,人工智能面临以下典型挑战:样本稀疏、泛化能力弱以及缺乏多模态技术。基础模型技术可显著缓解上述问题。为此,我们设计了首届基础模型挑战赛,旨在提升基础模型技术在交通场景中的普及度,并推动智能交通行业的快速发展。该挑战赛分为两大赛道:一体化模型与跨模态图像检索。此外,我们针对这两个赛道提出了名为Open-TransMind的新基线与基准。据我们所知,Open-TransMind是首个具备多任务与多模态能力的开源交通基础模型。同时,Open-TransMind在交通场景的检测、分类及分割数据集上均能达到最优性能。我们的源代码已开源在https://github.com/Traffic-X/Open-TransMind。