In this report, we present the first place solution to the ECCV 2024 BRAVO Challenge, where a model is trained on Cityscapes and its robustness is evaluated on several out-of-distribution datasets. Our solution leverages the powerful representations learned by vision foundation models, by attaching a simple segmentation decoder to DINOv2 and fine-tuning the entire model. This approach outperforms more complex existing approaches, and achieves first place in the challenge. Our code is publicly available at https://github.com/tue-mps/benchmark-vfm-ss.
翻译:本报告介绍了我们在ECCV 2024 BRAVO挑战赛中荣获第一名的解决方案。该挑战要求模型在Cityscapes数据集上进行训练,并在多个分布外数据集上评估其鲁棒性。我们的方案通过将简单的分割解码器附加到DINOv2模型上,并对整个模型进行微调,从而利用了视觉基础模型所学习到的强大表征能力。该方法超越了现有更复杂的方案,并在挑战赛中取得了第一名。我们的代码已公开于 https://github.com/tue-mps/benchmark-vfm-ss。