Basic values are concepts or beliefs which pertain to desirable end-states and transcend specific situations. Studying personal values in social media can illuminate how and why societal values evolve especially when the stimuli-based methods, such as surveys, are inefficient, for instance, in hard-to-reach populations. On the other hand, user-generated content is driven by the massive use of stereotyped, culturally defined speech constructions rather than authentic expressions of personal values. We aimed to find a model that can accurately detect value-expressive posts in Russian social media VKontakte. A training dataset of 5,035 posts was annotated by three experts, 304 crowd-workers and ChatGPT. Crowd-workers and experts showed only moderate agreement in categorizing posts. ChatGPT was more consistent but struggled with spam detection. We applied an ensemble of human- and AI-assisted annotation involving active learning approach, subsequently trained several LLMs and selected a model based on embeddings from pre-trained fine-tuned rubert-tiny2, and reached a high quality of value detection with F1 = 0.75 (F1-macro = 0.80). This model provides a crucial step to a study of values within and between Russian social media users.
翻译:基本价值观是指涉及理想终极状态并超越具体情境的概念或信念。研究社交媒体中的个人价值观,可以揭示社会价值观的演变方式及原因,特别是在基于刺激的方法(如问卷调查)效率低下时(例如难以触及的人群)。另一方面,用户生成内容受大量使用刻板、文化定义的言语结构的驱动,而非个人价值观的真实表达。我们旨在寻找一种能够准确检测俄罗斯社交媒体VKontakte中价值表达帖子的模型。一个包含5,035个帖子的训练数据集由三位专家、304名众包工作者和ChatGPT进行了标注。众包工作者和专家在帖子分类上仅表现出中等程度的一致性。ChatGPT更一致,但在垃圾信息检测方面存在困难。我们采用了结合主动学习的人工与人工智能辅助标注集成方法,随后训练了多个大语言模型,并基于预训练微调后的rubert-tiny2的嵌入向量选择了最终模型,实现了价值检测的高质量结果,F1得分为0.75(F1宏平均=0.80)。该模型为研究俄罗斯社交媒体用户内部及用户间的价值观迈出了关键一步。