We introduce LOCORE, Long-Context Re-ranker, a model that takes as input local descriptors corresponding to an image query and a list of gallery images and outputs similarity scores between the query and each gallery image. This model is used for image retrieval, where typically a first ranking is performed with an efficient similarity measure, and then a shortlist of top-ranked images is re-ranked based on a more fine-grained similarity measure. Compared to existing methods that perform pair-wise similarity estimation with local descriptors or list-wise re-ranking with global descriptors, LOCORE is the first method to perform list-wise re-ranking with local descriptors. To achieve this, we leverage efficient long-context sequence models to effectively capture the dependencies between query and gallery images at the local-descriptor level. During testing, we process long shortlists with a sliding window strategy that is tailored to overcome the context size limitations of sequence models. Our approach achieves superior performance compared with other re-rankers on established image retrieval benchmarks of landmarks (ROxf and RPar), products (SOP), fashion items (In-Shop), and bird species (CUB-200) while having comparable latency to the pair-wise local descriptor re-rankers.
翻译:本文提出LOCORE(长上下文重排序模型),该模型以图像查询及其对应图库图像列表的局部描述符作为输入,输出查询图像与各图库图像间的相似度分数。该模型主要用于图像检索任务:通常先通过高效相似度度量进行初步排序,再基于更精细的相似度度量对排名靠前的候选图像短列表进行重排序。相较于现有基于局部描述符的成对相似度估计方法或基于全局描述符的列表式重排序方法,LOCORE是首个基于局部描述符实现列表式重排序的方法。为实现这一目标,我们利用高效的长上下文序列模型,在局部描述符层级有效捕捉查询图像与图库图像间的依赖关系。在测试阶段,我们采用滑动窗口策略处理长候选列表,该策略专门用于克服序列模型的上下文长度限制。在经典图像检索基准数据集(地标数据集ROxf和RPar、商品数据集SOP、时尚单品数据集In-Shop以及鸟类物种数据集CUB-200)上,本方法相较于其他重排序模型展现出更优性能,同时其延迟时间与基于局部描述符的成对重排序模型相当。