A Bayesian approach to uncertainty in word embedding bias estimation

Multiple measures, such as WEAT or MAC, attempt to quantify the magnitude of bias present in word embeddings in terms of a single-number metric. However, such metrics and the related statistical significance calculations rely on treating pre-averaged data as individual data points and employing bootstrapping techniques with low sample sizes. We show that similar results can be easily obtained using such methods even if the data are generated by a null model lacking the intended bias. Consequently, we argue that this approach generates false confidence. To address this issue, we propose a Bayesian alternative: hierarchical Bayesian modeling, which enables a more uncertainty-sensitive inspection of bias in word embeddings at different levels of granularity. To showcase our method, we apply it to Religion, Gender, and Race word lists from the original research, together with our control neutral word lists. We deploy the method using Google, Glove, and Reddit embeddings. Further, we utilize our approach to evaluate a debiasing technique applied to Reddit word embedding. Our findings reveal a more complex landscape than suggested by the proponents of single-number metrics. The datasets and source code for the paper are publicly available.

翻译：诸如WEAT或MAC等多种度量指标，试图通过单数值指标来量化词嵌入中存在的偏差程度。然而，这类指标及相关的统计显著性计算依赖于将预平均数据视为独立数据点，并采用低样本量的自助法技术。我们证明，即使数据由缺乏预期偏差的零模型生成，使用此类方法也容易得到类似结果。因此，我们认为这种方法会产生虚假置信度。为解决这一问题，我们提出一种贝叶斯替代方案：层次贝叶斯建模，能够在不同粒度层级上对词嵌入偏差进行更具不确定性敏感性的检查。为展示我们的方法，我们将其应用于原始研究中的宗教、性别和种族词表，以及我们控制的非极义中性词表。我们使用Google、Glove和Reddit嵌入来部署该方法。此外，我们利用该方法评估应用于Reddit词嵌入的消偏技术。我们的研究结果揭示了比单数值指标支持者所建议的更复杂的图景。本文的数据集和源代码已公开提供。

相关内容

词向量表示

关注 37

分散式表示即将语言表示为稠密、低维、连续的向量。研究者最早发现学习得到词嵌入之间存在类比关系。比如apple−apples ≈ car−cars， man−woman ≈ king – queen 等。这些方法都可以直接在大规模无标注语料上进行训练。词嵌入的质量也非常依赖于上下文窗口大小的选择。通常大的上下文窗口学到的词嵌入更反映主题信息，而小的上下文窗口学到的词嵌入更反映词的功能和上下文语义信息。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日