This research introduces the Multilevel Embedding Association Test (ML-EAT), a method designed for interpretable and transparent measurement of intrinsic bias in language technologies. The ML-EAT addresses issues of ambiguity and difficulty in interpreting the traditional EAT measurement by quantifying bias at three levels of increasing granularity: the differential association between two target concepts with two attribute concepts; the individual effect size of each target concept with two attribute concepts; and the association between each individual target concept and each individual attribute concept. Using the ML-EAT, this research defines a taxonomy of EAT patterns describing the nine possible outcomes of an embedding association test, each of which is associated with a unique EAT-Map, a novel four-quadrant visualization for interpreting the ML-EAT. Empirical analysis of static and diachronic word embeddings, GPT-2 language models, and a CLIP language-and-image model shows that EAT patterns add otherwise unobservable information about the component biases that make up an EAT; reveal the effects of prompting in zero-shot models; and can also identify situations when cosine similarity is an ineffective metric, rendering an EAT unreliable. Our work contributes a method for rendering bias more observable and interpretable, improving the transparency of computational investigations into human minds and societies.
翻译:本研究提出了多层级嵌入关联测试(ML-EAT),这是一种旨在对语言技术中的内在偏见进行可解释且透明测量的方法。ML-EAT通过在三个递增的粒度层级上量化偏见,解决了传统EAT测量中存在的模糊性和解释困难问题:两个目标概念与两个属性概念之间的差异关联;每个目标概念与两个属性概念的个体效应量;以及每个独立目标概念与每个独立属性概念之间的关联。利用ML-EAT,本研究定义了一套描述嵌入关联测试九种可能结果的EAT模式分类体系,每种模式均对应一个独特的EAT-Map——一种用于解释ML-EAT的新型四象限可视化工具。通过对静态与历时词嵌入、GPT-2语言模型以及CLIP语言-图像模型的实证分析表明:EAT模式能够提供关于构成EAT的组分偏见的其他方法无法观测的信息;揭示零样本模型中提示机制的影响效应;并能识别余弦相似度作为度量指标失效而导致EAT不可靠的情形。我们的工作贡献了一种使偏见更可观测、更可解释的方法,提升了针对人类心智与社会的计算研究的透明度。