Open-Vocabulary object detectors can recognize a wide range of categories using simple textual prompts. However, improving their ability to detect rare classes or specialize in certain domains remains a challenge. While most recent methods rely on a single set of model weights for adaptation, we take a different approach by using modular deep learning. We introduce DitHub, a framework designed to create and manage a library of efficient adaptation modules. Inspired by Version Control Systems, DitHub organizes expert modules like branches that can be fetched and merged as needed. This modular approach enables a detailed study of how adaptation modules combine, making it the first method to explore this aspect in Object Detection. Our approach achieves state-of-the-art performance on the ODinW-13 benchmark and ODinW-O, a newly introduced benchmark designed to evaluate how well models adapt when previously seen classes reappear. For more details, visit our project page: https://aimagelab.github.io/DitHub/
翻译:开放词汇目标检测器能够通过简单的文本提示识别广泛的目标类别。然而,提升其检测罕见类别或适应特定领域的能力仍然是一个挑战。尽管大多数现有方法依赖单一模型权重集进行适应,我们采用了一种不同的思路,即利用模块化深度学习。我们提出了DitHub,这是一个旨在创建和管理高效适应模块库的框架。受版本控制系统启发,DitHub将专家模块组织为类似分支的结构,可按需获取与合并。这种模块化方法使得深入探究适应模块如何组合成为可能,使其成为目标检测领域中首个探索此方面的研究方法。我们的方法在ODinW-13基准测试及新提出的ODinW-O基准测试(专为评估模型在已见类别重现时的适应能力而设计)上均取得了最先进的性能。更多细节请访问我们的项目页面:https://aimagelab.github.io/DitHub/