For object re-identification (re-ID), learning from synthetic data has become a promising strategy to cheaply acquire large-scale annotated datasets and effective models, with few privacy concerns. Many interesting research problems arise from this strategy, e.g., how to reduce the domain gap between synthetic source and real-world target. To facilitate developing more new approaches in learning from synthetic data, we introduce the Alice benchmarks, large-scale datasets providing benchmarks as well as evaluation protocols to the research community. Within the Alice benchmarks, two object re-ID tasks are offered: person and vehicle re-ID. We collected and annotated two challenging real-world target datasets: AlicePerson and AliceVehicle, captured under various illuminations, image resolutions, etc. As an important feature of our real target, the clusterability of its training set is not manually guaranteed to make it closer to a real domain adaptation test scenario. Correspondingly, we reuse existing PersonX and VehicleX as synthetic source domains. The primary goal is to train models from synthetic data that can work effectively in the real world. In this paper, we detail the settings of Alice benchmarks, provide an analysis of existing commonly-used domain adaptation methods, and discuss some interesting future directions. An online server will be set up for the community to evaluate methods conveniently and fairly.
翻译:对于物体重识别(re-ID),从合成数据中学习已成为一种有前景的策略,能够以较低成本获取大规模标注数据集和有效模型,且隐私问题较少。该策略引出了许多有趣的研究问题,例如如何缩小合成源域与真实目标域之间的域差距。为促进合成数据学习领域新方法的开发,我们提出了Alice基准,这是一个为研究社区提供基准测试及评估协议的大规模数据集。在Alice基准中,包含两项物体重识别任务:行人重识别和车辆重识别。我们收集并标注了两个具有挑战性的真实世界目标数据集:AlicePerson和AliceVehicle,它们涵盖不同的光照条件、图像分辨率等。作为真实目标数据集的重要特征,其训练集的聚类性并非人为保证,这使其更接近真实的域自适应测试场景。相应地,我们复用现有的PersonX和VehicleX作为合成源域。主要目标是训练出能够有效应用于真实世界的合成数据模型。本文详细阐述了Alice基准的设置,分析了现有常用域自适应方法,并探讨了一些有趣的未来方向。我们将搭建一个在线服务器,以便研究社区能够便捷、公平地评估各类方法。