Chemical named entity recognition (NER) models are used in many downstream tasks, from adverse drug reaction identification to pharmacoepidemiology. However, it is unknown whether these models work the same for everyone. Performance disparities can potentially cause harm rather than the intended good. This paper assesses gender-related performance disparities in chemical NER systems. We develop a framework for measuring gender bias in chemical NER models using synthetic data and a newly annotated corpus of over 92,405 words with self-identified gender information from Reddit. Our evaluation of multiple biomedical NER models reveals evident biases. For instance, synthetic data suggests female-related names are frequently misclassified as chemicals, especially for brand name mentions. Additionally, we observe performance disparities between female- and male-associated data in both datasets. Many systems fail to detect contraceptives such as birth control. Our findings emphasize the biases in chemical NER models, urging practitioners to account for these biases in downstream applications.
翻译:化学命名实体识别(NER)模型广泛应用于从药物不良反应识别到药物流行病学等下游任务中。然而,尚不清楚这些模型是否对所有人都具有相同的效果。性能差异可能潜在的造成危害而非预期中的益处。本论文评估了化学NER系统中与性别相关的性能差异。我们利用合成数据以及一个包含92,405个词的新标注语料库(该语料库包含来自Reddit的自报性别信息),开发了一个用于衡量化学NER模型中性别偏见的框架。对多个生物医学NER模型的评估揭示了明显的偏见。例如,合成数据表明,与女性相关的名字经常被错误分类为化学品,尤其是品牌名称提及。此外,我们在两个数据集中都观察到了与女性和男性关联数据之间的性能差异。许多系统无法检测到诸如避孕药等避孕用品。我们的发现强调了化学NER模型中的偏见,敦促从业者在下游应用中考虑这些偏见。