Dual-Diffusional Generative Fashion Recommendation

Personalized generative recommender systems have emerged as a promising solution for fashion recommendation. However, existing methods primarily rely on implicit visual embeddings from historical interactions, which often contain preference-irrelevant information and result in insufficient user behavior modeling. Moreover, these models typically generate only item images, providing limited interpretability. To address these limitations, we propose DualFashion, a Dual-Diffusional Generative Fashion Recommendation Architecture that jointly models image and text modalities for personalized and explainable recommendation. DualFashion adopts a dual-diffusion Transformer with image and text branches, where structured attribute-level captions and visual outfit information are jointly used as conditioning signals to model user behavior. The proposed architecture produces both fashion item images and textual descriptions, ensuring visual compatibility while providing explicit semantic interpretability. Furthermore, we introduce a text-augmented fine-tuning strategy that enhances generation diversity and enables effective cross-modal knowledge transfer without incurring heavy computational costs. Extensive experiments on iFashion and Polyvore-U across Personalized Fill-in-the-Blank and Generative Outfit Recommendation tasks demonstrate that DualFashion achieves strong performance in behavior modeling, interpretability, and efficiency compared to state-of-the-art methods. Our code and model checkpoints are available at https://github.com/LinkMingzhe/DualFashion.

翻译：个性化生成式推荐系统已成为时尚推荐领域的一项有前景的解决方案。然而，现有方法主要依赖历史交互中的隐式视觉嵌入，这些嵌入常包含与偏好无关的信息，导致用户行为建模不足。此外，这些模型通常仅生成商品图像，可解释性有限。为应对这些局限，我们提出DualFashion——一种双扩散生成式时尚推荐架构，该架构联合建模图像与文本模态，以实现个性化且可解释的推荐。DualFashion采用包含图像分支与文本分支的双扩散Transformer，其中结构化属性级描述与视觉穿搭信息被联合用作条件信号来建模用户行为。该架构同时生成时尚商品图像与文本描述，在确保视觉搭配性的同时提供明确的语义可解释性。此外，我们引入一种文本增强微调策略，该策略在不增加显著计算开销的前提下提升生成多样性，并实现有效的跨模态知识迁移。在iFashion与Polyvore-U数据集上开展的个性化填空式推荐与生成式穿搭推荐任务的大量实验表明，与现有最优方法相比，DualFashion在行为建模、可解释性与效率方面均实现了优异性能。我们的代码与模型检查点已开源至https://github.com/LinkMingzhe/DualFashion。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

生成式推荐综述：数据、模型与任务

专知会员服务

19+阅读 · 2025年11月4日

生成式人工智能时代的多目标推荐：最新进展与未来展望综述

专知会员服务

36+阅读 · 2025年6月23日

【WWW2025】释放大型语言模型在去噪推荐中的强大能力

专知会员服务

13+阅读 · 2025年2月18日

生成式推荐最新进展

专知会员服务

25+阅读 · 2025年1月8日