Product bundling provides clients with a strategic combination of individual items.And it has gained significant attention in recent years as a fundamental prerequisite for online services. Recent methods utilize multimodal information through sophisticated extractors for bundling, but remain limited by inferior semantic understanding, the restricted scope of knowledge, and an inability to handle cold-start issues.Despite the extensive knowledge and complex reasoning capabilities of large language models (LLMs), their direct utilization fails to process multimodalities and exploit their knowledge for multimodal product bundling. Adapting LLMs for this purpose involves demonstrating the synergies among different modalities and designing an effective optimization strategy for bundling, which remains challenging.To this end, we introduce Bundle-LLM to bridge the gap between LLMs and product bundling tasks. Sepcifically, we utilize a hybrid item tokenization to integrate multimodal information, where a simple yet powerful multimodal fusion module followed by a trainable projector embeds all non-textual features into a single token. This module not only explicitly exhibits the interplays among modalities but also shortens the prompt length, thereby boosting efficiency.By designing a prompt template, we formulate product bundling as a multiple-choice question given candidate items. Furthermore, we adopt progressive optimization strategy to fine-tune the LLMs for disentangled objectives, achieving effective product bundling capability with comprehensive multimodal semantic understanding.Extensive experiments on four datasets from two application domains show that our approach outperforms a range of state-of-the-art (SOTA) methods.
翻译:产品捆绑为客户提供了一种将单个商品进行策略性组合的方式。近年来,作为在线服务的一项基本前提,它已受到广泛关注。现有方法通过复杂的提取器利用多模态信息进行捆绑,但仍受限于较差的语义理解能力、有限的知识范围以及无法处理冷启动问题。尽管大型语言模型(LLMs)拥有广泛的知识和复杂的推理能力,但其直接应用无法处理多模态信息,也未能利用其知识进行多模态产品捆绑。为此目的适配LLMs,需要展示不同模态之间的协同作用,并设计有效的捆绑优化策略,这仍然具有挑战性。为此,我们提出了Bundle-LLM,以弥合LLMs与产品捆绑任务之间的差距。具体而言,我们采用混合项目标记化来整合多模态信息,其中一个简单而强大的多模态融合模块以及一个可训练的投影器将所有非文本特征嵌入到一个单一的标记中。该模块不仅明确展示了模态间的相互作用,还缩短了提示长度,从而提高了效率。通过设计提示模板,我们将产品捆绑任务形式化为给定候选项目的多项选择题。此外,我们采用渐进式优化策略对LLMs进行微调,以实现解耦的目标,从而获得具备全面多模态语义理解能力的有效产品捆绑能力。在两个应用领域的四个数据集上进行的大量实验表明,我们的方法优于一系列最先进的(SOTA)方法。