Large Language Model (LLM) agents are increasingly used in real-world products, where personalized and context-aware user interactions are essential. A central enabler of such capabilities is the agent's long-term semantic memory system, which extracts implicit and explicit signals from noisy longitudinal behavioral data, stores them in a structured form, and supports low-latency retrieval. Building industrial-grade long-term memory for LLM agents raises five challenges: scalability, low-latency retrieval, privacy constraints, cross-domain generalizability, and observability. We introduce the Hierarchical Long-Term Semantic Memory (HLTM) framework, which organizes textual data into a schema-aligned memory tree that captures semantic knowledge at multiple levels of granularity, enabling scalable ingestion, privacy-aware storage, low-latency retrieval, and transparent provenance; HLTM further incorporates an adaptation mechanism to generalize across diverse use cases. Extensive evaluations on LinkedIn's Hiring Assistant show that HLTM improves answer correctness and retrieval F1 significantly by more than 10%, while significantly advancing the Pareto frontier between query and indexing latency. HLTM has been deployed in LinkedIn's Hiring Assistant to power core personalization features in production hiring workflows.
翻译:大型语言模型(LLM)智能体正越来越多地被应用于实际产品中,其中个性化与上下文感知的用户交互至关重要。实现此类能力的核心组件是智能体的长期语义记忆系统,该系统能从嘈杂的纵向行为数据中提取隐式和显式信号,以结构化形式存储,并支持低延迟检索。为LLM智能体构建工业级长期记忆面临五项挑战:可扩展性、低延迟检索、隐私约束、跨领域泛化能力及可观测性。我们提出分层长期语义记忆(HLTM)框架,该框架将文本数据组织成符合模式架构的记忆树,在多个粒度层级捕获语义知识,从而实现可扩展的摄入、隐私感知存储、低延迟检索及透明溯源追踪;HLTM进一步融合自适应机制,可泛化至多种应用场景。在领英招聘助手上的广泛评估表明,HLTM将回答正确率与检索F1值显著提升超过10%,同时大幅推进了查询与索引延迟的帕累托前沿。HLTM已部署于领英招聘助手,为生产级招聘流程中的核心个性化功能提供支撑。