UMAIR-FPS: User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style

The rapid advancement of high-quality image generation models based on AI has generated a deluge of anime illustrations. Recommending illustrations to users within massive data has become a challenging and popular task. However, existing anime recommendation systems have focused on text features but still need to integrate image features. In addition, most multi-modal recommendation research is constrained by tightly coupled datasets, limiting its applicability to anime illustrations. We propose the User-aware Multi-modal Animation Illustration Recommendation Fusion with Painting Style (UMAIR-FPS) to tackle these gaps. In the feature extract phase, for image features, we are the first to combine image painting style features with semantic features to construct a dual-output image encoder for enhancing representation. For text features, we obtain text embeddings based on fine-tuning Sentence-Transformers by incorporating domain knowledge that composes a variety of domain text pairs from multilingual mappings, entity relationships, and term explanation perspectives, respectively. In the multi-modal fusion phase, we novelly propose a user-aware multi-modal contribution measurement mechanism to weight multi-modal features dynamically according to user features at the interaction level and employ the DCN-V2 module to model bounded-degree multi-modal crosses effectively. UMAIR-FPS surpasses the stat-of-the-art baselines on large real-world datasets, demonstrating substantial performance enhancements.

翻译：基于人工智能的高质量图像生成模型的快速发展催生了海量的动漫插图。在庞大数据中向用户推荐插图已成为一项具有挑战性且热门的任务。然而，现有的动漫推荐系统主要关注文本特征，仍需集成图像特征。此外，大多数多模态推荐研究受限于紧密耦合的数据集，限制了其在动漫插图中的应用。为解决这些不足，我们提出了基于绘画风格的用户感知多模态动漫插图推荐融合（UMAIR-FPS）。在特征提取阶段，针对图像特征，我们首次将图像绘画风格特征与语义特征相结合，构建了双输出图像编码器以增强表征能力。针对文本特征，我们通过融入领域知识——分别从多语言映射、实体关系和术语解释角度构建多样化的领域文本对——对Sentence-Transformers进行微调，从而获取文本嵌入。在多模态融合阶段，我们创新性地提出了用户感知的多模态贡献度量机制，在交互层面根据用户特征动态加权多模态特征，并采用DCN-V2模块有效建模有界度的多模态交叉。UMAIR-FPS在大型真实世界数据集上超越了现有最先进基线，展示了显著的性能提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日