As social issues related to gender bias attract closer scrutiny, accurate tools to determine the gender profile of large groups become essential. When explicit data is unavailable, gender is often inferred from names. Current methods follow a strategy whereby individuals of the group, one by one, are assigned a gender label or probability based on gender-name correlations observed in the population at large. We show that this strategy is logically inconsistent and has practical shortcomings, the most notable of which is the systematic underestimation of gender bias. We introduce a global inference strategy that estimates gender composition according to the context of the full list of names. The tool suffers from no intrinsic methodological effects, is robust against errors, easily implemented, and computationally light.
翻译:随着与性别偏见相关的社会问题受到更密切的关注,准确确定大型群体性别分布的工具变得至关重要。当显式数据不可用时,性别通常通过名字推断。当前的方法遵循一种策略,即基于总体中观察到的性别-名字相关性,逐一为群体中的每个个体分配性别标签或概率。我们证明了这一策略在逻辑上不一致,并存在实际缺陷,其中最显著的是系统性低估了性别偏见。我们引入了一种全局推断策略,根据完整名字列表的上下文来估计性别组成。该工具不存在内在的方法论效应,对错误具有鲁棒性,易于实现,且计算开销低。