In the era of large foundation models, the quality of embeddings has become a central determinant of downstream task performance and overall system capability. Yet widely used dense embeddings are often extremely high-dimensional, incurring substantial costs in storage, memory, and inference latency. To address these, Contrastive Sparse Representation (CSR) is recently proposed as a promising direction, mapping dense embeddings into high-dimensional but k-sparse vectors, in contrast to compact dense embeddings such as Matryoshka Representation Learning (MRL). Despite its promise, CSR suffers severe degradation in the ultra-sparse regime, where over 80% of neurons remain inactive, leaving much of its efficiency potential unrealized. In this paper, we introduce CSRv2, a principled training approach designed to make ultra-sparse embeddings viable. CSRv2 stabilizes sparsity learning through progressive k-annealing, enhances representational quality via supervised contrastive objectives, and ensures end-to-end adaptability with full backbone finetuning. CSRv2 reduces dead neurons from 80% to 20% and delivers a 14% accuracy gain at k=2, bringing ultra-sparse embeddings on par with CSR at k=8 and MRL at 32 dimensions, all with only two active features. While maintaining comparable performance, CSRv2 delivers a 7x speedup over MRL, and yields up to 300x improvements in compute and memory efficiency relative to dense embeddings in text representation. Extensive experiments across text and vision demonstrate that CSRv2 makes ultra-sparse embeddings practical without compromising performance, where CSRv2 achieves 7%/4% improvement over CSR when k=4 and further increases this gap to 14%/6% when k=2 in text/vision representation. By making extreme sparsity viable, CSRv2 broadens the design space for real-time and edge-deployable AI systems where both embedding quality and efficiency are critical.
翻译:在大规模基础模型时代,嵌入质量已成为下游任务性能和整体系统能力的核心决定因素。然而广泛使用的稠密嵌入通常维度极高,导致存储、内存和推理延迟方面的巨大开销。为解决这些问题,对比稀疏表示(CSR)近期被提出作为一个有前景的方向,它将稠密嵌入映射为高维但k稀疏的向量,这与紧凑稠密嵌入(如Matryoshka表示学习(MRL))形成对比。尽管前景广阔,CSR在超稀疏机制下存在严重性能退化——超过80%的神经元保持非活跃状态,导致其效率潜力未能充分发挥。本文提出CSRv2,这是一种旨在实现超稀疏嵌入可行性的原理性训练方法。CSRv2通过渐进式k退火稳定稀疏学习,利用监督对比目标增强表示质量,并通过全骨干网络微调确保端到端适应性。CSRv2将死亡神经元比例从80%降至20%,在k=2时带来14%的准确率提升,使超稀疏嵌入达到CSR(k=8)和MRL(32维)同等性能,且仅激活两个特征。在保持相当性能的同时,CSRv2相比MRL实现7倍加速,在文本表示中相比稠密嵌入获得高达300倍的计算与内存效率提升。跨文本与视觉领域的广泛实验表明,CSRv2在不牺牲性能的前提下实现了超稀疏嵌入的实用化:在文本/视觉表示中,当k=4时CSRv2相比CSR提升7%/4%,当k=2时该差距进一步扩大至14%/6%。通过实现极端稀疏性的可行性,CSRv2为嵌入质量与效率均至关重要的实时与边缘可部署AI系统拓宽了设计空间。