Leveraging Large Multimodal Models (LMMs) to simulate human behaviors when processing multimodal information, especially in the context of social media, has garnered immense interest due to its broad potential and far-reaching implications. Emojis, as one of the most unique aspects of digital communication, are pivotal in enriching and often clarifying the emotional and tonal dimensions. Yet, there is a notable gap in understanding how these advanced models, such as GPT-4V, interpret and employ emojis in the nuanced context of online interaction. This study intends to bridge this gap by examining the behavior of GPT-4V in replicating human-like use of emojis. The findings reveal a discernible discrepancy between human and GPT-4V behaviors, likely due to the subjective nature of human interpretation and the limitations of GPT-4V's English-centric training, suggesting cultural biases and inadequate representation of non-English cultures.
翻译:利用大型多模态模型(LMMs)模拟人类在处理多模态信息(尤其是社交媒体语境)时的行为,因其广阔的应用前景和深远影响而备受关注。表情符号作为数字通信中最独特的元素之一,在丰富乃至明确情感与语气维度方面发挥着关键作用。然而,当前在理解GPT-4V等先进模型如何于线上交互的细微语境中解读和使用表情符号方面,仍存在显著空白。本研究旨在通过考察GPT-4V复制人类表情符号使用行为的表现来弥合这一空白。研究结果揭示了人类与GPT-4V行为之间可辨别的差异,这很可能源于人类解读的主观性以及GPT-4V以英语为中心训练数据的局限性(暗示文化偏见及对非英语文化代表性不足)。