As AI agents become the primary consumers of retrieval APIs, there is an opportunity to expose more of the retrieval pipeline to the caller. flexvec is a retrieval kernel that exposes the embedding matrix and score array as a programmable surface, allowing arithmetic operations on both before selection. We refer to composing operations on this surface at query time as Programmatic Embedding Modulation (PEM). This paper describes a set of such operations and integrates them into a SQL interface via a query materializer that facilitates composable query primitives. On a production corpus of 240,000 chunks, three composed modulations execute in 19 ms end-to-end on a desktop CPU without approximate indexing. At one million chunks, the same operations execute in 82 ms.
翻译:随着AI代理成为检索API的主要使用者,将检索管道的更多部分暴露给调用方成为一种机遇。flexvec是一种检索内核,它将嵌入矩阵和得分数组暴露为可编程界面,允许在两者之上进行算术运算后再执行选择操作。我们将这种在查询时对可编程界面进行复合操作的方法称为"可编程嵌入调制"(PEM)。本文描述了一组此类操作,并通过查询物化器将其集成到SQL接口中,从而支持可组合的查询原语。在一个包含24万个数据块的生产语料库中,三种复合调制在桌面CPU上无需近似索引即可在19毫秒内完成端到端执行。当数据块数量达到100万个时,相同操作在82毫秒内执行完毕。