Developing a universal model that can effectively harness heterogeneous resources and respond to a wide range of personalized needs has been a longstanding community aspiration. Our daily choices, especially in domains like fashion and retail, are substantially shaped by multi-modal data, such as pictures and textual descriptions. These modalities not only offer intuitive guidance but also cater to personalized user preferences. However, the predominant personalization approaches mainly focus on the ID or text-based recommendation problem, failing to comprehend the information spanning various tasks or modalities. In this paper, our goal is to establish a Unified paradigm for Multi-modal Personalization systems (UniMP), which effectively leverages multi-modal data while eliminating the complexities associated with task- and modality-specific customization. We argue that the advancements in foundational generative modeling have provided the flexibility and effectiveness necessary to achieve the objective. In light of this, we develop a generic and extensible personalization generative framework, that can handle a wide range of personalized needs including item recommendation, product search, preference prediction, explanation generation, and further user-guided image generation. Our methodology enhances the capabilities of foundational language models for personalized tasks by seamlessly ingesting interleaved cross-modal user history information, ensuring a more precise and customized experience for users. To train and evaluate the proposed multi-modal personalized tasks, we also introduce a novel and comprehensive benchmark covering a variety of user requirements. Our experiments on the real-world benchmark showcase the model's potential, outperforming competitive methods specialized for each task.
翻译:构建一个能够有效利用异构资源并响应各类个性化需求的通用模型,一直是学术界长期追求的目标。在时尚与零售等领域的日常选择中,多模态数据(如图片与文本描述)发挥着关键作用。这些模态不仅提供直观的指导,还能满足个性化的用户偏好。然而,当前主流的个性化方法主要聚焦于基于ID或文本的推荐问题,未能理解跨任务或跨模态的信息。本文旨在建立一种统一的多模态个性化系统范式(UniMP),该范式能有效利用多模态数据,同时消除因任务和模态特异性定制带来的复杂性。我们认为,基础生成式建模的进展为实现该目标提供了必要的灵活性和有效性。基于此,我们开发了一个通用可扩展的个性化生成框架,可处理包括物品推荐、产品搜索、偏好预测、解释生成以及用户引导的图像生成在内的广泛个性化需求。我们的方法通过无缝融合交错排列的跨模态用户历史信息,增强了基础语言模型在个性化任务中的能力,从而为用户提供更精准、更定制化的体验。为训练和评估所提出的多模态个性化任务,我们还引入了一个新颖全面的基准数据集,覆盖了多种用户需求。基于真实世界基准的实验表明,该模型潜力显著,在各项任务中均超越了专为该任务设计的竞品方法。