Reply with Sticker: New Dataset and Model for Sticker Retrieval

Using stickers in online chatting is very prevalent on social media platforms, where the stickers used in the conversation can express someone's intention/emotion/attitude in a vivid, tactful, and intuitive way. Existing sticker retrieval research typically retrieves stickers based on context and the current utterance delivered by the user. That is, the stickers serve as a supplement to the current utterance. However, in the real-world scenario, using stickers to express what we want to say rather than as a supplement to our words only is also important. Therefore, in this paper, we create a new dataset for sticker retrieval in conversation, called StickerInt, where stickers are used to reply to previous conversations or supplement our words. Based on the created dataset, we present a simple yet effective framework for sticker retrieval in conversation based on the learning of intention and the cross-modal relationships between conversation context and stickers, coined as \textbf{Int-RA}. Specifically, we first devise a knowledge-enhanced intention predictor to introduce the intention information into the conversation representations. Subsequently, a relation-aware sticker selector is devised to retrieve the response sticker via cross-modal relationships. Extensive experiments on the created dataset show that the proposed model achieves state-of-the-art performance in sticker retrieval.

翻译：在在线聊天中使用贴纸在社交媒体平台上非常普遍，用户通过贴纸能够以生动、委婉且直观的方式表达意图/情感/态度。现有贴纸检索研究通常基于对话上下文和用户当前话语来检索贴纸，即贴纸仅作为当前话语的补充。然而，在真实场景中，用贴纸表达我们想说的内容而不仅仅是对文字的补充同样重要。因此，本文构建了一个用于对话中贴纸检索的新数据集StickerInt，其中贴纸既可用于回复前文对话，也可作为话语补充。基于该数据集，我们提出了一种简单而有效的对话贴纸检索框架，通过意图学习以及对话上下文与贴纸之间的跨模态关系实现，命名为\textbf{Int-RA}。具体而言，我们首先设计了一个知识增强的意图预测器，将意图信息引入对话表示中；随后，通过跨模态关系设计了一个关系感知的贴纸选择器来检索回复贴纸。在创建的数据集上的大量实验表明，所提模型在贴纸检索任务上达到了当前最优性能。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日