Emojis improve communication quality among smart-phone users that use mobile keyboards to exchange text. To predict emojis for users based on input text, we should consider the on-device low memory and time constraints, ensure that the on-device emoji classifier covers a wide range of emoji classes even though the emoji dataset is typically imbalanced, and adapt the emoji classifier output to user favorites. This paper proposes an on-device emoji classifier based on MobileBert with reasonable memory and latency requirements for SwiftKey. To account for the data imbalance, we utilize the widely used GPT to generate one or more tags for each emoji class. For each emoji and corresponding tags, we merge the original set with GPT-generated sentences and label them with this emoji without human intervention to alleviate the data imbalance. At inference time, we interpolate the emoji output with the user history for emojis for better emoji classifications. Results show that the proposed on-device emoji classifier deployed for SwiftKey increases the accuracy performance of emoji prediction particularly on rare emojis and emoji engagement.
翻译:表情符号提升了使用移动键盘进行文本交流的智能手机用户间的沟通质量。为了根据输入文本为用户预测表情符号,我们需要考虑设备端有限的内存与时间约束,确保设备端表情符号分类器能够覆盖广泛的表情符号类别(尽管表情符号数据集通常存在不平衡问题),并使分类器输出适应用户偏好。本文提出了一种基于MobileBERT的设备端表情符号分类器,该分类器为SwiftKey键盘提供了合理的内存与延迟要求。针对数据不平衡问题,我们利用广泛使用的GPT模型为每个表情符号类别生成一个或多个标签。针对每个表情符号及其对应标签,我们将原始数据集与GPT生成的语句进行合并,并在无需人工干预的情况下将其标注为该表情符号,从而缓解数据不平衡问题。在推理阶段,我们将表情符号输出与用户历史使用记录进行插值处理,以获得更优的表情符号分类效果。实验结果表明,为SwiftKey部署的所提设备端表情符号分类器显著提升了表情符号预测的准确率,尤其在稀有表情符号的识别和表情符号使用参与度方面表现突出。