While communication research frequently studies latent message features like moral appeals, their quantification remains a challenge. Conventional human coding struggles with scalability and intercoder reliability. While dictionary-based methods are cost-effective and computationally efficient, they often lack contextual sensitivity and are limited by the vocabularies developed for the original applications. In this paper, we present a novel approach to construct vec-tionary measurement tools that boost validated dictionaries with word embeddings through nonlinear optimization. By harnessing semantic relationships encoded by embeddings, vec-tionaries improve the measurement of latent message features by expanding the applicability of original vocabularies to other contexts. Vec-tionaries can also help extract semantic information from texts, especially those in short format, beyond the original vocabulary of a dictionary. Importantly, a vec-tionary can produce additional metrics to capture the valence and ambivalence of a latent feature beyond its strength in texts. Using moral appeals in COVID-19-related tweets as a case study, we illustrate the steps to construct the moral foundations vec-tionary, showcasing its ability to process posts missed by dictionary methods and to produce measurements better aligned with crowdsourced human assessments. Furthermore, additional metrics from the moral foundations vec-tionary unveiled unique insights that facilitated predicting outcomes such as message retransmission.
翻译:尽管传播学研究中常涉及道德诉求等潜在信息特征,但其量化仍是一大挑战。传统人工编码在可扩展性和编码者信度方面存在困难。虽然基于词典的方法成本低、计算效率高,但往往缺乏语境敏感性,且受限于原始应用场景所开发的词汇集。本文提出一种构建向量词典测量工具的新方法,通过非线性优化利用词向量增强已验证词典。通过利用词向量编码的语义关系,向量词典将原始词汇的适用范围扩展到其他语境,从而改进潜在信息特征的测量。向量词典还能从文本(尤其是短文本)中提取超出原始词典词汇范围的语义信息。重要的是,向量词典可生成额外指标,不仅捕捉文本中潜在特征的强度,还能反映其效价和矛盾性。本研究以COVID-19相关推文中的道德诉求为案例,详细说明构建道德基础向量词典的步骤,展示其处理词典方法遗漏帖文的能力,并生成与人工众包评估更一致的测量结果。此外,道德基础向量词典的额外指标揭示了预测信息转发等结果的新视角。