KARMA: Knowledge-Action Regularized Multimodal Alignment for Personalized Search at Taobao

Large Language Models (LLMs) are equipped with profound semantic knowledge, making them a natural choice for injecting semantic generalization into personalized search systems. However, in practice we find that directly fine-tuning LLMs on industrial personalized tasks (e.g. next item prediction) often yields suboptimal results. We attribute this bottleneck to a critical Knowledge--Action Gap: the inherent conflict between preserving pre-trained semantic knowledge and aligning with specific personalized actions by discriminative objectives. Empirically, action-only training objectives induce Semantic Collapse, such as attention "sinks". This degradation severely cripples the LLM's generalization, failing to bring improvements to personalized search systems. We propose KARMA (Knowledge--Action Regularized Multimodal Alignment), a unified framework that treats semantic reconstruction as a train-only regularizer. KARMA optimizes a next-interest embedding for retrieval (Action) while enforcing semantic decodability (Knowledge) through two complementary objectives: (i) history-conditioned semantic generation, which anchors optimization to the LLM's native next-token distribution, and (ii) embedding-conditioned semantic reconstruction, which constrains the interest embedding to remain semantically recoverable. On Taobao search system, KARMA mitigates semantic collapse (attention-sink analysis) and improves both action metrics and semantic fidelity. In ablations, semantic decodability yields up to +22.5 HR@200. With KARMA, we achieve +0.25 CTR AUC in ranking, +1.86 HR in pre-ranking and +2.51 HR in recalling. Deployed online with low inference overhead at ranking & pre-ranking stage, KARMA drives +0.9% increase in GMV.

翻译：大型语言模型（LLMs）具备深厚的语义知识，使其天然适用于为个性化搜索系统注入语义泛化能力。然而在实践中我们发现，直接对工业级个性化任务（如下一个商品预测）微调LLMs往往效果不佳。我们将此瓶颈归因于关键的知识-动作鸿沟：即保留预训练语义知识与通过判别式目标对齐特定个性化动作之间的固有冲突。实验表明，纯动作训练目标会引发语义坍塌（如注意力"沉没"现象），这种退化严重削弱了LLM的泛化能力，导致其无法为个性化搜索系统带来改进。我们提出KARMA（知识-动作正则化多模态对齐）统一框架，将语义重建作为仅训练阶段的正则化项。KARMA通过两个互补目标优化用于检索的下一个兴趣嵌入（动作），同时强制保持语义可解码性（知识）：(i) 历史条件语义生成，将优化过程锚定至LLM原生的下一token分布；(ii) 嵌入条件语义重建，约束兴趣嵌入的语义可恢复性。在淘宝搜索系统中，KARMA缓解了语义坍塌（注意力沉没分析），并同时提升了动作指标与语义保真度。消融实验显示，语义可解码性带来高达+22.5的HR@200提升。采用KARMA后，我们在排序阶段实现+0.25的CTR AUC提升，预排序阶段实现+1.86的HR提升，召回阶段实现+2.51的HR提升。通过在线部署于排序与预排序阶段（仅需极低推理开销），KARMA驱动GMV提升+0.9%。