Social media platforms play a pivotal role in shaping political discourse, but analyzing their vast and rapidly evolving content remains a major challenge. We introduce an end-to-end framework for automatically generating an interpretable topic taxonomy from an unlabeled corpus. By combining unsupervised clustering with prompt-based labeling, our method leverages large language models (LLMs) to iteratively construct a taxonomy without requiring seed sets or domain expertise. We apply this framework to a large corpus of Meta (previously known as Facebook) political ads from the month ahead of the 2024 U.S. Presidential election. Our approach uncovers latent discourse structures, synthesizes semantically rich topic labels, and annotates topics with moral framing dimensions. We show quantitative and qualitative analyses to demonstrate the effectiveness of our framework. Our findings reveal that voting and immigration ads dominate overall spending and impressions, while abortion and election-integrity achieve disproportionate reach. Funding patterns are equally polarized: economic appeals are driven mainly by conservative PACs, abortion messaging splits between pro- and anti-rights coalitions, and crime-and-justice campaigns are fragmented across local committees. The framing of these appeals also diverges--abortion ads emphasize liberty/oppression rhetoric, while economic messaging blends care/harm, fairness/cheating, and liberty/oppression narratives. Topic salience further reveals strong correlations between moral foundations and issues. Demographic targeting also emerges. This work supports scalable, interpretable analysis of political messaging on social media, enabling researchers, policymakers, and the public to better understand emerging narratives, polarization dynamics, and the moral underpinnings of digital political communication.
翻译:社交媒体平台在塑造政治话语中发挥着关键作用,但分析其海量且快速演变的内容仍是一项重大挑战。我们提出了一种端到端框架,用于从未标注语料库中自动生成可解释的主题分类体系。该方法结合无监督聚类与基于提示的标注技术,利用大语言模型迭代构建分类体系,无需种子集或领域专业知识。我们将该框架应用于2024年美国总统选举前一个月内Meta(原Facebook)平台的大规模政治广告语料库。我们的方法能够揭示潜在话语结构,合成语义丰富的主题标签,并利用道德框架维度对主题进行标注。我们通过定量与定性分析证明了框架的有效性。研究发现:投票与移民类广告在总支出与曝光量中占据主导地位,而堕胎与选举诚信类广告则获得了不成比例的关注度。资金模式同样呈现两极分化:经济诉求类广告主要由保守派政治行动委员会推动,堕胎议题信息在支持与反对权利联盟间形成割裂,犯罪与司法类竞选活动则分散于地方委员会。这些诉求的框架构建也存在差异——堕胎广告侧重自由/压迫修辞,而经济类信息则融合关怀/伤害、公平/欺骗及自由/压迫叙事。主题显著性进一步揭示了道德基础与议题间的强相关性。人口统计学定向策略亦显现其中。本工作支持对社交媒体政治信息进行可扩展、可解释的分析,助力研究者、政策制定者及公众更好地理解新兴叙事、极化动态以及数字政治传播的道德基础。