Identifying and Mitigating Gender Cues in Academic Recommendation Letters: An Interpretability Case Study

Letters of recommendation (LoRs) can carry patterns of implicitly gendered language that can inadvertently influence downstream decisions, e.g. in hiring and admissions. In this work, we investigate the extent to which Transformer-based encoder models as well as Large Language Models (LLMs) can infer the gender of applicants in academic LoRs submitted to an U.S. medical-residency program after explicit identifiers like names and pronouns are de-gendered. While using three models (DistilBERT, RoBERTa, and Llama 2) to classify the gender of anonymized and de-gendered LoRs, significant gender leakage was observed as evident from up to 68% classification accuracy. Text interpretation methods, like TF-IDF and SHAP, demonstrate that certain linguistic patterns are strong proxies for gender, e.g. "emotional'' and "humanitarian'' are commonly associated with LoRs from female applicants. As an experiment in creating truly gender-neutral LoRs, these implicit gender cues were remove resulting in a drop of up to 5.5% accuracy and 2.7% macro $F_1$ score on re-training the classifiers. However, applicant gender prediction still remains better than chance. In this case study, our findings highlight that 1) LoRs contain gender-identifying cues that are hard to remove and may activate bias in decision-making and 2) while our technical framework may be a concrete step toward fairer academic and professional evaluations, future work is needed to interrogate the role that gender plays in LoR review. Taken together, our findings motivate upstream auditing of evaluative text in real-world academic letters of recommendation as a necessary complement to model-level fairness interventions.

翻译：推荐信可能隐含性别化语言模式，从而在招聘和招生等下游决策中无意间产生影响。本研究探讨了在去除姓名、代词等显性标识后，基于Transformer的编码模型及大型语言模型能从提交至美国医学住院医师项目的学术推荐信中推断出申请人性别到何种程度。在使用三种模型（DistilBERT、RoBERTa和Llama 2）对匿名化且去性别化的推荐信进行性别分类时，观察到显著的性别泄露现象，分类准确率高达68%。文本解释方法（如TF-IDF和SHAP）表明，某些语言模式是性别的强代理指标，例如"情感丰富"和"人道主义"通常与女性申请人的推荐信相关联。作为创建真正性别中立推荐信的一项实验，我们移除了这些隐性的性别线索，导致重新训练分类器后准确率下降高达5.5%，宏F₁值下降2.7%。然而，申请人的性别预测仍高于随机水平。本案例研究表明：1）推荐信中包含难以去除的性别识别线索，可能激活决策偏见；2）尽管我们的技术框架是迈向更公平的学术与专业评审的具体一步，但未来仍需研究性别在推荐信审阅中的作用。综上，我们的研究结果促使将真实世界学术推荐信中的评估文本上游审计，作为模型层面公平性干预的必要补充。