The ever-increasing size of language models curtails their widespread access to the community, thereby galvanizing many companies and startups into offering access to large language models through APIs. One particular API, suitable for dense retrieval, is the semantic embedding API that builds vector representations of a given text. With a growing number of APIs at our disposal, in this paper, our goal is to analyze semantic embedding APIs in realistic retrieval scenarios in order to assist practitioners and researchers in finding suitable services according to their needs. Specifically, we wish to investigate the capabilities of existing APIs on domain generalization and multilingual retrieval. For this purpose, we evaluate the embedding APIs on two standard benchmarks, BEIR, and MIRACL. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective on English, in contrast to the standard practice, i.e., employing them as first-stage retrievers. For non-English retrieval, re-ranking still improves the results, but a hybrid model with BM25 works best albeit at a higher cost. We hope our work lays the groundwork for thoroughly evaluating APIs that are critical in search and more broadly, in information retrieval.
翻译:语言模型规模的持续增长限制了社区对其的广泛访问,从而推动众多企业和初创公司通过API提供大型语言模型的访问权限。一种适用于稠密检索的特定API是语义嵌入API,该接口能构建给定文本的向量表征。随着可供使用的API数量不断增加,本文旨在分析现实检索场景中的语义嵌入API,以帮助实践者和研究者根据需求找到合适的服务。具体而言,我们希望探究现有API在领域泛化与多语言检索方面的能力。为此,我们在两个标准基准测试集BEIR和MIRACL上对嵌入API进行评估。研究发现:与常规做法(即将其作为第一阶段检索器)不同,利用API对BM25结果进行重排序是一种经济实惠的方法,且对英语检索效果最佳;对于非英语检索,重排序仍能提升结果质量,但采用BM25的混合模型虽然成本更高,却能取得最优效果。我们期望这项工作能为全面评估在搜索及更广泛的信息检索领域中至关重要的API奠定基础。