Generating an informative and attractive title for the product is a crucial task for e-commerce. Most existing works follow the standard multimodal natural language generation approaches, e.g., image captioning, and employ the large scale of human-labelled datasets to train desirable models. However, for novel products, especially in a different domain, there are few existing labelled data. In this paper, we propose a prompt-based approach, i.e., the Multimodal Prompt Learning framework, to accurately and efficiently generate titles for novel products with limited labels. We observe that the core challenges of novel product title generation are the understanding of novel product characteristics and the generation of titles in a novel writing style. To this end, we build a set of multimodal prompts from different modalities to preserve the corresponding characteristics and writing styles of novel products. As a result, with extremely limited labels for training, the proposed method can retrieve the multimodal prompts to generate desirable titles for novel products. The experiments and analyses are conducted on five novel product categories under both the in-domain and out-of-domain experimental settings. The results show that, with only 1% of downstream labelled data for training, our proposed approach achieves the best few-shot results and even achieves competitive results with fully-supervised methods trained on 100% of training data; With the full labelled data for training, our method achieves state-of-the-art results.
翻译:为产品生成信息丰富且吸引人的标题是电子商务中的关键任务。现有工作大多遵循标准的跨模态自然语言生成方法(例如图像描述),并利用大规模人工标注数据集来训练理想模型。然而,对于新产品(尤其是跨领域产品),现有的标注数据极为稀缺。本文提出了一种基于提示的方法——多模态提示学习框架,能够在标签有限的情况下准确高效地生成新产品标题。我们观察到,新产品标题生成的核心挑战在于理解新产品的特性以及采用新颖写作风格生成标题。为此,我们从不同模态构建了一组多模态提示,以保留新产品的对应特性与写作风格。因此,在训练标签极度有限的情况下,所提方法能够检索多模态提示以生成理想的新产品标题。我们在域内与跨域两种实验设置下对五个新产品类别进行了实验与分析。结果表明,仅使用1%的下游标注数据进行训练,所提方法便能取得最佳少样本结果,甚至与使用100%训练数据的全监督方法性能竞争;而使用全部标注数据训练时,我们的方法达到了最先进水平。