KARMA: Knowledge-Action Regularized Multimodal Alignment for Personalized Search at Taobao

Large Language Models (LLMs) are equipped with profound semantic knowledge, making them a natural choice for injecting semantic generalization into personalized search systems. However, in practice we find that directly fine-tuning LLMs on industrial personalized tasks (e.g. next item prediction) often yields suboptimal results. We attribute this bottleneck to a critical Knowledge--Action Gap: the inherent conflict between preserving pre-trained semantic knowledge and aligning with specific personalized actions by discriminative objectives. Empirically, action-only training objectives induce Semantic Collapse, such as attention ``sinks''. This degradation severely cripples the LLM's generalization, failing to bring improvements to personalized search systems. We propose KARMA (Knowledge--Action Regularized Multimodal Alignment), a unified framework that treats semantic reconstruction as a train-only regularizer. KARMA optimizes a next-interest embedding for retrieval (Action) while enforcing semantic decodability (Knowledge) through two complementary objectives: (i) history-conditioned semantic generation, which anchors optimization to the LLM's native next-token distribution, and (ii) embedding-conditioned semantic reconstruction, which constrains the interest embedding to remain semantically recoverable. On Taobao search system, KARMA mitigates semantic collapse (attention-sink analysis) and improves both action metrics and semantic fidelity. In ablations, semantic decodability yields up to +22.5 HR@200. With KARMA, we achieve +0.25 CTR AUC in ranking, +1.86 HR in pre-ranking and +2.51 HR in recalling. Deployed online with low inference overhead at ranking stage, KARMA drives +0.5% increase in Item Click.

翻译：大语言模型（LLMs）具备深层的语义知识，使其成为向个性化搜索系统注入语义泛化能力的天然选择。然而实际应用中我们发现，直接对LLMs进行工业个性化任务（如下一个商品预测）的微调往往效果欠佳。我们将这一瓶颈归因于关键的知识-动作鸿沟：即保持预训练语义知识与通过判别式目标对齐特定个性化动作之间的内在冲突。实验表明，仅基于动作的训练目标会导致语义崩塌现象，例如注意力“沉没”。这种退化严重削弱了LLM的泛化能力，无法为个性化搜索系统带来改进。我们提出KARMA（知识-动作正则化多模态对齐）——一个统一的框架，将语义重构作为仅用于训练的正则化项。KARMA通过两个互补目标优化用于检索的下一个兴趣嵌入（动作）同时强制语义可解码性（知识）：(i)基于历史条件的语义生成，将优化锚定在LLM原生的下一个词元分布上；(ii)基于嵌入条件的语义重构，约束兴趣嵌入保持语义可恢复性。在淘宝搜索系统中，KARMA缓解了语义崩塌（注意力沉没分析），同时改善了动作指标和语义保真度。消融实验中，语义可解码性带来HR@200最高提升+22.5。通过KARMA，我们在排序阶段实现AUC提升+0.25，粗排阶段HR提升+1.86，召回阶段HR提升+2.51。在排序阶段以低推理开销在线部署后，KARMA驱动商品点击率提升+0.5%。