CSRv2: Unlocking Ultra-Sparse Embeddings

In the era of large foundation models, the quality of embeddings has become a central determinant of downstream task performance and overall system capability. Yet widely used dense embeddings are often extremely high-dimensional, incurring substantial costs in storage, memory, and inference latency. To address these, Contrastive Sparse Representation (CSR) is recently proposed as a promising direction, mapping dense embeddings into high-dimensional but k-sparse vectors, in contrast to compact dense embeddings such as Matryoshka Representation Learning (MRL). Despite its promise, CSR suffers severe degradation in the ultra-sparse regime, where over 80% of neurons remain inactive, leaving much of its efficiency potential unrealized. In this paper, we introduce CSRv2, a principled training approach designed to make ultra-sparse embeddings viable. CSRv2 stabilizes sparsity learning through progressive k-annealing, enhances representational quality via supervised contrastive objectives, and ensures end-to-end adaptability with full backbone finetuning. CSRv2 reduces dead neurons from 80% to 20% and delivers a 14% accuracy gain at k=2, bringing ultra-sparse embeddings on par with CSR at k=8 and MRL at 32 dimensions, all with only two active features. While maintaining comparable performance, CSRv2 delivers a 7x speedup over MRL, and yields up to 300x improvements in compute and memory efficiency relative to dense embeddings in text representation. Extensive experiments across text and vision demonstrate that CSRv2 makes ultra-sparse embeddings practical without compromising performance, where CSRv2 achieves 7%/4% improvement over CSR when k=4 and further increases this gap to 14%/6% when k=2 in text/vision representation. By making extreme sparsity viable, CSRv2 broadens the design space for real-time and edge-deployable AI systems where both embedding quality and efficiency are critical.

翻译：在大规模基础模型时代，嵌入质量已成为下游任务性能与整体系统能力的核心决定因素。然而广泛使用的稠密嵌入往往维度极高，导致存储、内存和推理延迟方面的显著开销。为解决此问题，对比稀疏表示（CSR）近期被提出作为一种有前景的方向，其将稠密嵌入映射为高维但k稀疏的向量，这与紧凑稠密嵌入（如Matryoshka Representation Learning, MRL）形成对比。尽管前景可观，CSR在超稀疏状态下（超过80%的神经元保持非活跃状态）存在严重的性能退化，导致其效率潜力未能充分释放。本文提出CSRv2，这是一种旨在使超稀疏嵌入可行的原理性训练方法。CSRv2通过渐进式k退火稳定稀疏学习，利用监督对比目标增强表示质量，并通过全骨干网络微调确保端到端的适应性。CSRv2将死亡神经元比例从80%降至20%，在k=2时带来14%的准确率提升，使超稀疏嵌入达到CSR在k=8和MRL在32维度的性能水平，且仅激活两个特征。在保持相当性能的同时，CSRv2相比MRL实现7倍加速，在文本表示任务中相对稠密嵌入获得高达300倍的计算与内存效率提升。在文本和视觉领域的广泛实验表明，CSRv2在不牺牲性能的前提下实现了超稀疏嵌入的实用化：在文本/视觉表示任务中，当k=4时CSRv2较CSR提升7%/4%，当k=2时该差距进一步扩大至14%/6%。通过实现极端稀疏性的可行性，CSRv2为对嵌入质量和效率均有严格要求的实时与边缘部署AI系统拓宽了设计空间。