Large language models (LLMs) have recently enabled remarkable progress in text representation. However, their embeddings are typically high-dimensional, leading to substantial storage and retrieval overhead. Although recent approaches such as Matryoshka Representation Learning (MRL) and Contrastive Sparse Representation (CSR) alleviate these issues to some extent, they still suffer from retrieval accuracy degradation. This paper proposes \emph{Isolation Kernel Embedding} or IKE, a learning-free method that transforms an LLM embedding into a binary embedding using Isolation Kernel (IK). IKE is an ensemble of diverse (random) partitions, enabling robust estimation of ideal kernel in the LLM embedding space, thus reducing retrieval accuracy loss as the ensemble grows. Lightweight and based on binary encoding, it offers low memory footprint and fast bitwise computation, lowering retrieval latency. Experiments on multiple text retrieval datasets demonstrate that IKE offers up to 16.7x faster retrieval and 16x lower memory usage than LLM embeddings, while maintaining comparable or better accuracy. Compared to CSR and other compression methods, IKE consistently achieves the best balance between retrieval efficiency and effectiveness.
翻译:近年来,大语言模型(LLMs)在文本表示方面取得了显著进展。然而,其生成的嵌入向量通常维度较高,导致显著的存储与检索开销。尽管近期提出的方法如套娃表示学习(MRL)与对比稀疏表示(CSR)在一定程度上缓解了这些问题,但它们仍存在检索精度下降的缺陷。本文提出一种免学习方法——隔离核嵌入(IKE),该方法利用隔离核(IK)将LLM嵌入转换为二进制嵌入。IKE通过集成多样化的(随机)划分,能够对LLM嵌入空间中的理想核进行稳健估计,从而随着集成规模的增大减少检索精度损失。该方法基于二进制编码实现,具有轻量级特性,不仅内存占用低,还可通过快速的按位运算降低检索延迟。在多个文本检索数据集上的实验表明,与原始LLM嵌入相比,IKE可实现高达16.7倍的检索加速和16倍的内存节省,同时保持相当或更优的检索精度。相较于CSR及其他压缩方法,IKE始终在检索效率与效果之间取得最佳平衡。