Designing an embedding retrieval system requires navigating a complex design space of conflicting trade-offs between efficiency and effectiveness. This work structures these decisions as a vertical traversal of the system design stack. We begin with the Representation Layer by examining how loss functions and architectures, specifically Bi-encoders and Cross-encoders, define semantic relevance and geometric projection. Next, we analyze the Granularity Layer and evaluate how segmentation strategies like Atomic and Hierarchical chunking mitigate information bottlenecks in long-context documents. Moving to the Orchestration Layer, we discuss methods that transcend the single-vector paradigm, including hierarchical retrieval, agentic decomposition, and multi-stage reranking pipelines to resolve capacity limitations. Finally, we address the Robustness Layer by identifying architectural mitigations for domain generalization failures, lexical blind spots, and the silent degradation of retrieval quality due to temporal drift. By categorizing these limitations and design choices, we provide a comprehensive framework for practitioners to optimize the efficiency-effectiveness frontier in modern neural search systems.
翻译:设计嵌入检索系统需要在效率与效能的矛盾权衡中穿越复杂的设计空间。本研究将这些决策构建为系统设计栈的垂直遍历。我们从表征层出发,考察损失函数与架构(特别是双编码器和交叉编码器)如何定义语义相关性与几何投影。接着,我们分析粒度层,评估原子式与层级式分块等分割策略如何缓解长上下文文档中的信息瓶颈。在编排层,我们讨论超越单向量范式的方法,包括层级检索、智能分解与多阶段重排序流程,以解决容量限制问题。最后,我们通过识别针对领域泛化失效、词汇盲区以及时序漂移导致检索质量隐性衰减的架构缓解方案,来探讨鲁棒性层。通过对这些局限性与设计选择进行分类,我们为实践者提供了一个优化现代神经搜索系统效率-效能边界的综合框架。