Semantic search with large language models (LLMs) enables retrieval by meaning rather than keyword overlap, but scaling it requires major inference efficiency advances. We present LinkedIn's LLM-based semantic search framework for AI Job Search and AI People Search, combining an LLM relevance judge, embedding-based retrieval, and a compact Small Language Model trained via multi-teacher distillation to jointly optimize relevance and engagement. A prefill-oriented inference architecture co-designed with model pruning, context compression, and text-embedding hybrid interactions boosts ranking throughput by over 75x under a fixed latency constraint while preserving near-teacher-level NDCG, enabling one of the first production LLM-based ranking systems with efficiency comparable to traditional approaches and delivering significant gains in quality and user engagement.
翻译:基于大语言模型(LLMs)的语义搜索实现了按含义而非关键词匹配进行检索,但其规模化应用需要推理效率的重大提升。我们介绍了LinkedIn面向AI职位搜索和AI人才搜索的、基于LLM的语义搜索框架。该框架结合了LLM相关性评判器、基于嵌入的检索,以及一个通过多教师蒸馏训练得到的紧凑型小语言模型,以联合优化相关性和用户参与度。一种与模型剪枝、上下文压缩及文本-嵌入混合交互协同设计的、面向预填充的推理架构,在固定延迟约束下将排序吞吐量提升了超过75倍,同时保持了接近教师模型水平的NDCG。这使得我们构建了首批生产级、基于LLM的排序系统之一,其效率可与传统方法相媲美,并在质量和用户参与度方面带来了显著提升。