Semantic representations are integral to natural language processing, psycholinguistics, and artificial intelligence. Although often derived from internet text, recent years have seen a rise in the popularity of behavior-based (e.g., free associations) and brain-based (e.g., fMRI) representations, which promise improvements in our ability to measure and model human representations. We carry out the first systematic evaluation of the similarities and differences between semantic representations derived from text, behavior, and brain data. Using representational similarity analysis, we show that word vectors derived from behavior and brain data encode information that differs from their text-derived cousins. Furthermore, drawing on our psychNorms metabase, alongside an interpretability method that we call representational content analysis, we find that, in particular, behavior representations capture unique variance on certain affective, agentic, and socio-moral dimensions. We thus establish behavior as an important complement to text for capturing human representations and behavior. These results are broadly relevant to research aimed at learning human-aligned semantic representations, including work on evaluating and aligning large language models.
翻译:语义表征在自然语言处理、心理语言学和人工智能领域具有核心地位。尽管通常从互联网文本中衍生,近年来基于行为(如自由联想)和基于大脑(如功能磁共振成像)的表征方法日益受到关注,这些方法有望提升我们测量和建模人类表征的能力。我们首次系统评估了源自文本、行为与脑数据的语义表征之间的异同。通过表征相似性分析,我们证明从行为与脑数据衍生的词向量所编码的信息,与其基于文本的对应表征存在差异。进一步,依托我们构建的心理规范元数据库,并结合一种称为表征内容分析的可解释性方法,我们发现行为表征尤其在某些情感、能动性及社会道德维度上捕捉到独特的方差。因此,我们确立了行为作为文本的重要补充,用于捕捉人类表征与行为。这些发现对旨在学习人类对齐语义表征的研究具有广泛意义,包括大型语言模型的评估与对齐工作。