AI systems must adapt to evolving visual environments, especially in domains where object appearances change over time. We introduce Car Models in Time (CaMiT), a fine-grained dataset capturing the temporal evolution of car models, a representative class of technological artifacts. CaMiT includes 787K labeled samples of 190 car models (2007-2023) and 5.1M unlabeled samples (2005-2023), supporting both supervised and self-supervised learning. Static pretraining on in-domain data achieves competitive performance with large-scale generalist models while being more resource-efficient, yet accuracy declines when models are tested across years. To address this, we propose a time-incremental classification setting, a realistic continual learning scenario with emerging, evolving, and disappearing classes. We evaluate two strategies: time-incremental pretraining, which updates the backbone, and time-incremental classifier learning, which updates only the final layer, both improving temporal robustness. Finally, we explore time-aware image generation that leverages temporal metadata during training, yielding more realistic outputs. CaMiT offers a rich benchmark for studying temporal adaptation in fine-grained visual recognition and generation.
翻译:人工智能系统必须适应不断演变的视觉环境,尤其在物体外观随时间变化的领域。我们提出了“时间维度上的汽车模型”(Car Models in Time, CaMiT)数据集,这是一个细粒度数据集,捕捉了作为技术制品典型代表的汽车模型随时间演化的过程。CaMiT包含190个汽车模型(2007-2023年)的78.7万个标注样本以及510万个未标注样本(2005-2023年),支持监督式与自监督学习。在领域内数据进行静态预训练,可在保持更高资源效率的同时达到与大规模通用模型相当的性能,但当模型跨年份测试时准确率会下降。为解决此问题,我们提出了一种时间增量分类设定——这是一种包含新兴、演变及消失类别的现实持续学习场景。我们评估了两种策略:更新主干网络的时间增量预训练,以及仅更新最终层的时间增量分类器学习,两者均提升了时间鲁棒性。最后,我们探索了在训练过程中利用时间元数据的时间感知图像生成方法,从而产生更逼真的输出。CaMiT为研究细粒度视觉识别与生成中的时间适应问题提供了一个丰富的基准。