Generative Retrieval (GR) has emerged as a promising paradigm to unify indexing and search within a single probabilistic framework. However, existing approaches suffer from two intrinsic conflicts: (1) an Optimization Blockage, where the non-differentiable nature of discrete indexing creates a gradient blockage, decoupling index construction from the downstream retrieval objective; and (2) a Geometric Conflict, where standard unnormalized inner-product objectives induce norm-inflation instability, causing popular "hub" items to geometrically overshadow relevant long-tail items. To systematically resolve these misalignments, we propose Differentiable Geometric Indexing (DGI). First, to bridge the optimization gap, DGI enforces Operational Unification. It employs Soft Teacher Forcing via Gumbel-Softmax to establish a fully differentiable pathway, combined with Symmetric Weight Sharing to effectively align the quantizer's indexing space with the retriever's decoding space. Second, to restore geometric fidelity, DGI introduces Isotropic Geometric Optimization. We replace inner-product logits with scaled cosine similarity on the unit hypersphere to effectively decouple popularity bias from semantic relevance. Extensive experiments on large-scale industry search datasets and online e-commerce platform demonstrate that DGI outperforms competitive sparse, dense, and generative baselines. Notably, DGI exhibits superior robustness in long-tail scenarios, validating the necessity of harmonizing structural differentiability with geometric isotropy.
翻译:生成式检索作为一种有前景的范式,将索引构建与搜索过程统一于单一概率框架中。然而,现有方法存在两个内在冲突:(1)优化阻塞问题:离散索引的不可微分特性导致梯度阻塞,使索引构建与下游检索目标脱节;(2)几何冲突问题:标准未归一化内积目标引发范数膨胀不稳定性,导致热门"枢纽"项在几何空间上遮蔽相关长尾项。为系统解决这些错位问题,本文提出可微分几何索引方法。首先,为弥合优化鸿沟,DGI实施操作统一机制:通过Gumbel-Softmax实现软教师强制训练以建立全微分路径,并结合对称权重共享技术,使量化器的索引空间与检索器的解码空间有效对齐。其次,为恢复几何保真度,DGI引入各向同性几何优化:将内积对数替换为单位超球面上的缩放余弦相似度,从而有效解耦流行度偏差与语义相关性。在大规模工业搜索数据集和在线电商平台上的实验表明,DGI在稀疏检索、稠密检索和生成式基线方法中均取得优越性能。值得注意的是,DGI在长尾场景中展现出卓越的鲁棒性,验证了结构可微分性与几何各向同性协同优化的必要性。