Interpretable Semantic Gradients in SSD: A PCA Sweep Approach and a Case Study on AI Discourse

Supervised Semantic Differential (SSD) is a mixed quantitative-interpretive method that models how text meaning varies with continuous individual-difference variables by estimating a semantic gradient in an embedding space and interpreting its poles through clustering and text retrieval. SSD applies PCA before regression, but currently no systematic method exists for choosing the number of retained components, introducing avoidable researcher degrees of freedom in the analysis pipeline. We propose a PCA sweep procedure that treats dimensionality selection as a joint criterion over representation capacity, gradient interpretability, and stability across nearby values of K. We illustrate the method on a corpus of short posts about artificial intelligence written by Prolific participants who also completed Admiration and Rivalry narcissism scales. The sweep yields a stable, interpretable Admiration-related gradient contrasting optimistic, collaborative framings of AI with distrustful and derisive discourse, while no robust alignment emerges for Rivalry. We also show that a counterfactual using a high-PCA dimension solution heuristic produces diffuse, weakly structured clusters instead, reinforcing the value of the sweep-based choice of K. The case study shows how the PCA sweep constrains researcher degrees of freedom while preserving SSD's interpretive aims, supporting transparent and psychologically meaningful analyses of connotative meaning.

翻译：监督语义差异（SSD）是一种定量与解释相结合的混合方法，它通过在嵌入空间中估计语义梯度并借助聚类与文本检索技术解释其两极，从而建模文本意义如何随连续个体差异变量变化。SSD在回归分析前应用主成分分析（PCA），但目前尚无系统方法确定保留的主成分数量，这导致分析流程中引入了本可避免的研究者自由度。我们提出一种PCA扫描程序，将维度选择视为表征能力、梯度可解释性及相邻K值间稳定性的联合判据。我们在一个由Prolific平台参与者撰写的关于人工智能的短帖语料库上演示该方法，这些参与者同时完成了钦佩型与对抗型自恋量表。扫描过程产生了一个稳定且可解释的与钦佩维度相关的语义梯度，其两极分别对应乐观协作的AI论述框架与充满不信任及嘲弄的话语模式，而对抗型维度则未呈现稳健的语义对齐。我们还证明，若采用高PCA维度的解决方案启发式进行反事实分析，则会产生分散且结构松散的聚类结果，这进一步印证了基于扫描选择K值的价值。本案例研究表明，PCA扫描方法能在保持SSD解释目标的同时有效约束研究者自由度，从而支持对内涵意义进行透明且具有心理学意义的分析。

相关内容

PCA

关注 3

在统计中，主成分分析（PCA）是一种通过最大化每个维度的方差来将较高维度空间中的数据投影到较低维度空间中的方法。给定二维，三维或更高维空间中的点集合，可以将“最佳拟合”线定义为最小化从点到线的平均平方距离的线。可以从垂直于第一条直线的方向类似地选择下一条最佳拟合线。重复此过程会产生一个正交的基础，其中数据的不同单个维度是不相关的。这些基向量称为主成分。

【CVPR2024】GroupContrast：语义感知的自监督表示学习用于三维理解

专知会员服务

18+阅读 · 2024年3月15日

【CVPR 2022-UCSD&英伟达】GroupViT:从文本监督中产生语义分割，Semantic Segmentation Emerges from Text Supervision

专知会员服务

12+阅读 · 2022年3月9日

【AACL2020】自监督学习的自然语言处理

专知会员服务

52+阅读 · 2020年12月12日

[NeurIPS 2020 oral] 基于因果干预的弱监督语义分割

专知会员服务

47+阅读 · 2020年10月5日