We present a method for zero-shot recommendation of multimodal non-stationary content that leverages recent advancements in the field of generative AI. We propose rendering inputs of different modalities as textual descriptions and to utilize pre-trained LLMs to obtain their numerical representations by computing semantic embeddings. Once unified representations of all content items are obtained, the recommendation can be performed by computing an appropriate similarity metric between them without any additional learning. We demonstrate our approach on a synthetic multimodal nudging environment, where the inputs consist of tabular, textual, and visual data.
翻译:我们提出一种针对多模态非平稳内容的零样本推荐方法,该方法利用了生成式人工智能领域的最新进展。我们建议将不同模态的输入渲染为文本描述,并利用预训练大语言模型通过计算语义嵌入来获取其数值表示。一旦获得所有内容项的统一样本表示,即可通过计算它们之间的适当相似度度量来进行推荐,无需任何额外学习。我们在一个合成多模态引导环境中演示了该方法,其中输入包含表格、文本和视觉数据。