Modern neural encoders offer unprecedented text-image retrieval (TIR) accuracy, but their high computational cost impedes an adoption to large-scale image searches. To lower this cost, model cascades use an expensive encoder to refine the ranking of a cheap encoder. However, existing cascading algorithms focus on cross-encoders, which jointly process text-image pairs, but do not consider cascades of bi-encoders, which separately process texts and images. We introduce the small-world search scenario as a realistic setting where bi-encoder cascades can reduce costs. We then propose a cascading algorithm that leverages the small-world search scenario to reduce lifetime image encoding costs of a TIR system. Our experiments show cost reductions by up to 6x.
翻译:现代神经编码器在文本-图像检索(TIR)中实现了前所未有的精度,但其高昂的计算成本阻碍了在大规模图像搜索中的应用。为降低这一成本,模型级联技术通过使用昂贵的编码器来优化廉价编码器的排序结果。然而,现有级联算法主要聚焦于交叉编码器——这类编码器需联合处理文本-图像对,而未考虑分别处理文本与图像的双编码器级联方案。本文提出小世界搜索场景作为双编码器级联可降低实际成本的现实设定,并基于此提出一种利用小世界搜索特性来降低TIR系统全周期图像编码成本的级联算法。实验表明,该算法可实现高达6倍的成本缩减。