Real Talk, Virtual Faces: A Formal Concept Analysis of Personality and Sentiment in Influencer Audiences

Virtual influencers~(VIs) -- digitally synthetic social-media personas -- attract audiences whose discourse appears qualitatively different from discourse around human influencers~(HIs). Existing work characterises this difference through surveys or aggregate engagement statistics, which reveal \emph{what} audiences say but not \emph{how} multiple signals co-occur. We propose a two-layer, structure-first framework grounded in Formal Concept Analysis~(FCA) and association rule mining. The first layer applies FCA with support-based iceberg filtering to weekly-aggregated comment data, extracting discourse profiles -- weekly co-occurrence bundles of sentiment, Big Five personality cues, and topic tags. The second layer mines association rules at the comment level, revealing personality--sentiment--topic dependencies invisible to frequency-table analysis. Applied to YouTube comments from three VI--HI influencer pairs, the two-layer analysis reveals a consistent structural divergence: HI discourse concentrates into a single, emotionally regulated (stability-centred) regime (low neuroticism anchoring positivity), while VI discourse supports three structurally distinct discourse modes, including an appearance-discourse cluster absent from HI despite near-equal marginal prevalence. Topic-specific analyses further show that VI contexts exhibit negative sentiment in psychologically sensitive domains (mental health, body image, artificial identity) relative to HI contexts. Our results position FCA as a principled tool for multi-signal discourse analysis and demonstrate that virtuality reshapes not just what audiences say, but the underlying grammar of how signals co-occur in their reactions.

翻译：虚拟影响者（VIs）——数字化合成社交媒体角色——其受众的话语在性质上似乎与人类影响者（HIs）受众的话语存在差异。现有研究通过调查或聚合参与度统计来描述这种差异，这些方法揭示了受众“说了什么”，但未能揭示多种信号如何共同出现。我们提出一个基于形式概念分析（FCA）和关联规则挖掘的双层、结构优先框架。第一层应用FCA及基于支持度的冰山过滤技术处理按周聚合的评论数据，提取话语特征——即每周情感、大五人格线索和话题标签的共同出现集合。第二层在评论层面挖掘关联规则，揭示频率表分析无法发现的个性—情感—话题依赖关系。通过将其应用于三组VI-HI影响者的YouTube评论，双层分析揭示了一致的结构性分歧：HI话语集中于单一的、情感受控（以稳定性为中心）的模式（低神经质锚定积极性），而VI话语支持三种结构迥异的话语模式，包括一个HI语境中虽边缘出现频率近乎相等却缺失的外貌话语聚类。主题特定分析进一步表明，在心理敏感领域（心理健康、身体形象、人工身份）中，VI语境相对于HI语境表现出负面情感。我们的研究结果将FCA定位为多信号话语分析的一种原则性工具，并证明虚拟性不仅重塑了受众“说什么”，更重塑了其反应中信号共同出现的底层语法结构。