In the rapidly evolving landscape of social media, the introduction of new emojis in Unicode release versions presents a structured opportunity to explore digital language evolution. Analyzing a large dataset of sampled English tweets, we examine how newly released emojis gain traction and evolve in meaning. We find that community size of early adopters and emoji semantics are crucial in determining their popularity. Certain emojis experienced notable shifts in the meanings and sentiment associations during the diffusion process. Additionally, we propose a novel framework utilizing language models to extract words and pre-existing emojis with semantically similar contexts, which enhances interpretation of new emojis. The framework demonstrates its effectiveness in improving sentiment classification performance by substituting unknown new emojis with familiar ones. This study offers a new perspective in understanding how new language units are adopted, adapted, and integrated into the fabric of online communication.
翻译:在社交媒体快速演变的背景下,Unicode版本发布中新表情符号的引入为探索数字语言演化提供了结构化机遇。通过分析大规模英语推文采样数据集,我们考察了新发布表情符号如何获得关注并实现语义演变。研究发现,早期采纳者社群规模与表情符号语义特征对其流行度具有决定性影响。部分表情符号在传播过程中经历了意义与情感关联的显著转变。此外,我们提出了一种基于语言模型的新框架,通过提取语义上下文相似的词汇与既有表情符号,有效增强了对新表情符号的解读能力。该框架通过将未知新表情符号替换为熟悉表情符号,显著提升了情感分类性能。本研究为理解新语言单元如何在网络交流中被采纳、适应并融入语言体系提供了新视角。