This paper addresses the gap between general-purpose text embeddings and the specific demands of item retrieval tasks. We demonstrate the shortcomings of existing models in capturing the nuances necessary for zero-shot performance on item retrieval tasks. To overcome these limitations, we propose generate in-domain dataset from ten tasks tailored to unlocking models' representation ability for item retrieval. Our empirical studies demonstrate that fine-tuning embedding models on the dataset leads to remarkable improvements in a variety of retrieval tasks. We also illustrate the practical application of our refined model in a conversational setting, where it enhances the capabilities of LLM-based Recommender Agents like Chat-Rec. Our code is available at https://github.com/microsoft/RecAI.
翻译:本文针对通用文本嵌入与商品检索任务特定需求之间的鸿沟问题展开研究。我们揭示了现有模型在捕捉商品检索任务零样本性能所需细粒度特征方面存在的缺陷。为克服这些局限,我们提出了从十个定制化任务中生成领域内数据集的方法,旨在释放模型在商品检索任务中的表征能力。实验表明,基于该数据集微调嵌入模型可显著提升各类检索任务的表现。我们进一步展示了优化后模型在对话场景中的实际应用——该模型能有效增强基于大语言模型的推荐智能体(如Chat-Rec)的能力。相关代码已开源在https://github.com/microsoft/RecAI。